曼宁

AI 代理的行动

AI Agents in Action

迈克尔·兰汉姆

如需发表评论,请前往livebook

To comment go to livebook.

曼宁

曼宁

避难岛

Manning

Shelter Island

有关本书及其他曼宁出版社书籍的更多信息,请访问manning.com

For more information on this and other Manning titles go to manning.com.

版权

copyright

AI 代理的行动

AI Agents in Action

曼宁出版公司,

鲍德温路20号

,邮政信箱761

,纽约州谢尔特岛,邮编11964

Manning Publications Co.

20 Baldwin Road

PO Box 761

Shelter Island, NY 11964

开发编辑:贝基·惠特尼;

技术编辑:罗斯·特纳

;审校编辑:基肖尔·里特;

制作编辑:凯瑞·黑尔斯;

文案编辑:朱莉·麦克纳米;

校对员:凯蒂·坦南特;

技术校对员:罗斯·特纳;

排版员:丹尼斯·达利尼克;

封面设计师:玛丽亚·图多尔

Development editor: Becky Whitney

Technical editor: Ross Turner

Review editor: Kishor Rit

Production editor: Keri Hales

Copy editor: Julie McNamee

Proofreader: Katie Tennant

Technical proofreader: Ross Turner

Typesetter: Dennis Dalinnik

Cover designer: Marija Tudor

奉献

dedication

我将这本书献给所有与我一同踏上这段旅程的读者。

书籍是作者与读者建立深层次个人

联系的有力途径,我们可以逐章逐页地交流。在共同学习、

探索和成长的过程中,我找到了真正的意义。愿这本书能够启发你

、挑战你,并帮助你看到人工智能代理所蕴含的巨大潜力——

不仅面向未来,也面向当下。

I dedicate this book to all the readers who embark on this journey with me.

Books are a powerful way for an author to connect with readers on a deeply personal

level, chapter by chapter, page by page. In that shared experience of learning,

exploring, and growing together, I find true meaning. May this book inspire you

and challenge you, and help you see the incredible potential that AI agents hold—

not just for the future but also for today.

内容

contents

前言

preface

我踏入智能系统领域的旅程始于上世纪80年代初。和当时许多人一样,我相信人工智能(AI)指日可待。似乎只要再有一次创新和技术飞跃,我们就能拥有梦寐以求的智能。然而,那次飞跃却始终没有到来。

My journey into the world of intelligent systems began back in the early 1980s. Like many people then, I believed artificial intelligence (AI) was just around the corner. It always seemed like one more innovation and technological leap would lead us to the intelligence we imagined. But that leap never came.

或许是斯坦利·库布里克电影《2001太空漫游》中HAL的形象,让我对真正智能的计算机伙伴产生了浓厚的兴趣。经过多年的努力、尝试和无数次失败,我逐渐意识到,创造人工智能远比我们人类想象的要复杂得多。20世纪90年代初,我转变了方向,将我的技能应用于其他行业中更实际的目标。

Perhaps the promise of HAL, from Stanley Kubrick’s 2001: A Space Odyssey, captivated me with the idea of a truly intelligent computer companion. After years of effort, trial, and countless errors, I began to understand that creating AI was far more complex than we humans had imagined. In the early 1990s, I shifted my focus, applying my skills to more tangible goals in other industries.

直到上世纪90年代末,经历了一系列充满挑战和变革的事件后,我才意识到自己对构建智能系统的热情。我知道这些系统或许永远无法达到HAL那样的超级智能水平,但我并不在意。我从机器学习和数据科学领域找到了成就感,创造出能够学习和适应的模型。二十多年来,我一直在这个领域如鱼得水,致力于解决那些需要创造力、精准性和无限可能性的问题。

Not until the late 1990s, after experiencing a series of challenging and transformative events, did I realize my passion for building intelligent systems. I knew these systems might never reach the superintelligence of HAL, but I was okay with that. I found fulfillment in working with machine learning and data science, creating models that could learn and adapt. For more than 20 years, I thrived in this space, tackling problems that required creativity, precision, and a sense of possibility.

在那段时间里,我从事过各种各样的工作,从用于预测未知输入的遗传算法到为石油天然气行业的水平钻井开发生成式学习模型。这些经历促使我开始写作,并通过书籍分享我的知识,内容涵盖广泛,例如:逆向工程 Pokémon Go、构建增强现实和虚拟现实体验、为游戏设计音频,以及应用强化学习创建智能体。我曾多年埋头于代码,使用 Unity ML-Agents 和深度强化学习开发智能体。

During that time, I worked on everything from genetic algorithms for predicting unknown inputs to developing generative learning models for horizontal drilling in the oil-and-gas sector. These experiences led me to write, where I shared my knowledge by way of books on various topics—reverse-engineering Pokémon Go, building augmented and virtual reality experiences, designing audio for games, and applying reinforcement learning to create intelligent agents. I spent years knuckles-deep in code, developing agents in Unity ML-Agents and deep reinforcement learning.

即使在那时,我也从未想过有一天我只需向人工智能模型描述我的需求,它就能帮我实现。我也从未想过,在我有生之年,我能像今天这样自然地与人工智能协作。我当然也从未想过,这段旅程会如此迅速,同时又如此缓慢。

Even then, I never imagined that one day I could simply describe what I wanted to an AI model, and it would make it happen. I never imagined that, in my lifetime, I would be able to collaborate with an AI as naturally as I do today. And I certainly never imagined how fast—and simultaneously how slow—this journey would feel.

2022年11月,ChatGPT的发布彻底改变了一切。它改变了世界对人工智能的认知,也改变了我们构建智能系统的方式。对我而言,它也改变了我对这些系统能力的看法。突然间,能够自主执行复杂任务的智能体不再是遥不可及的梦想,而是触手可及的现实。在我早期的著作中,我曾描述过能够执行特定任务的智能体系统,但现在,这些曾经的理论构想已经触手可及。

In November 2022, the release of ChatGPT changed everything. It changed the world’s perception of AI, and it changed the way we build intelligent systems. For me, it also altered my perspective on the capabilities of these systems. Suddenly, the idea of agents that could autonomously perform complex tasks wasn’t just a far-off dream but instead a tangible, achievable reality. In some of my earlier books, I had described agentic systems that could undertake specific tasks, but now, those once-theoretical ideas were within reach.

这本书是我数十年来构建智能系统经验的集大成之作,也是我曾经对人工智能未来愿景的实现。人工智能体已经到来,它们蓄势待发,即将改变我们与技术的互动方式、我们的工作方式,以及最终我们的生活方式。

This book is the culmination of my decades of experience in building intelligent systems, but it’s also a realization of the dreams I once had about what AI could become. AI agents are here, poised to transform how we interact with technology, how we work, and, ultimately, how we live.

然而,即便在今天,我仍然看到许多组织在采用智能体系统时犹豫不决。我认为这种犹豫并非源于对人工智能的恐惧,而是源于对构建此类系统缺乏了解和专业知识。我希望本书能够帮助弥合这一差距。我想向大家介绍人工智能智能体,它们可以成为人人都能使用的工具——我们不应该惧怕它们,而应该尊重它们、负责任地管理它们,并学习如何与它们和谐共处。

Yet, even now, I see hesitation from organizations when it comes to adopting agentic systems. I believe this hesitation stems not from fear of AI but rather from a lack of understanding and expertise in building these systems. I hope that this book helps to bridge that gap. I want to introduce AI agents as tools that can be accessible to everyone—tools we shouldn’t fear but instead respect, manage responsibly, and learn to work with in harmony.

致谢

acknowledgments

我衷心感谢机器学习和深度学习社区的不懈努力和卓越贡献。就在几年前,许多人还在质疑人工智能领域是否会再次陷入“寒冬”——一段停滞不前、充满疑虑的时期。但正是由于无数人的坚持、才华和热情,人工智能领域不仅挺过了难关,而且蓬勃发展。我们正站在人工智能驱动未来的门槛上,我对这个才华横溢的社区所做出的贡献感激不尽。

I want to extend my deepest gratitude to the machine learning and deep learning communities for their tireless dedication and incredible work. Just a few short years ago, many questioned whether the field was headed for another AI winter—a period of stagnation and doubt. But thanks to the persistence, brilliance, and passion of countless individuals, the field not only persevered but also flourished. We’re standing on the threshold of an AI-driven future, and I am endlessly grateful for the contributions of this talented community.

即使借助人工智能,写书也绝非易事。它需要投入大量精力、精诚合作,以及各方的鼎力支持。我衷心感谢所有编辑和审稿人,是他们的辛勤付出成就了这本书。我要向每一位抽出时间审阅并提供反馈意见的人致以最诚挚的谢意。尤其要感谢我的内容编辑贝基·惠特尼(Becky Whitney)和我的技术编辑兼OpenSC首席制作和技术官罗斯·特纳(Ross Turner),感谢他们的辛勤付出;同时,我也要感谢Manning出版社的全体制作团队,感谢他们在整个过程中给予我的深刻见解和坚定不移的支持。

Writing a book, even with the help of AI, is no small feat. It takes dedication, collaboration, and a tremendous amount of support. I am incredibly thankful to the team of editors and reviewers who made this book possible. I want to express my heartfelt thanks to everyone who took the time to review and provide feedback. In particular, I want to thank Becky Whitney, my content editor, and Ross Turner, my technical editor and chief production and technology officer at OpenSC, for their dedication, as well as the whole production team at Manning for their insight and unwavering support throughout this journey.

致我的伴侣朗达——你的爱、耐心和鼓励对我意义非凡。你一直是我最坚实的后盾,不仅在这本书的创作过程中,也在我之前的所有作品中。没有你,我真的无法完成这一切。感谢你成为我的磐石、我的伴侣和我的灵感源泉。

To my partner, Rhonda—your love, patience, and encouragement mean the world to me. You’ve been the cornerstone of my support system, not just for this book but for all the books that have come before. I truly couldn’t have done any of this without you. Thank you for being my rock, my partner, and my inspiration.

本书的许多早期构思都源于我在Symend的工作经历。正是在Symend期间,我开始构思并设计智能体系统,这些概念和设计奠定了本书的基础。我衷心感谢Symend的同事们,特别是Peh Teh、Andrew Wright、Ziko Rajabali、Chris Garrett、Kouros、Fatemeh Torabi Asr、Sukh Singh和Hanif Joshaghani,感谢他们的合作与贡献。你们的真知灼见和辛勤付出使这些想法得以实现,我非常荣幸能与如此杰出的团队共事。

Many of the early ideas for this book grew out of my work at Symend. It was during my time there that I first began developing the concepts and designs for agentic systems that laid the foundation for this book. I am deeply grateful to my colleagues at Symend for their collaboration and contributions, including Peh Teh, Andrew Wright, Ziko Rajabali, Chris Garrett, Kouros, Fatemeh Torabi Asr, Sukh Singh, and Hanif Joshaghani. Your insights and hard work helped bring these ideas to life, and I am honored to have worked alongside such an incredible group of people.

最后,我要感谢所有审稿人:Anandaganesh Balakrishnan、Aryan Jadon、Chau Giang、Dan Sheikh、David Curran、Dibyendu Roy Chowdhury、Divya Bhargavi、Felipe Provezano Coutinho、Gary Pass、John Williams、Jose San Leandro、Laurence Giglio、Manish Jain、Maxim Volgin、Michael Wang、Mike Metzger、Piti Champeethong、Prashant Dwivedi、Radhika Kanubaddhi、Rajat Kant Goel、Ramaa Vissa、Richard Vaughan、Satej Kumar Sahu、Sergio Gtz、Siva Dhandapani、Annamaneni Sriharsha、Sri Ram Macharla、Sumit Bhattacharyya、Tony Holdroyd、Vidal Graupera、Vidhya Vinay 和 Vinoth Nageshwaran。您的建议使这本书变得更好。

Finally, I would like to thank all the reviewers: Anandaganesh Balakrishnan, Aryan Jadon, Chau Giang, Dan Sheikh, David Curran, Dibyendu Roy Chowdhury, Divya Bhargavi, Felipe Provezano Coutinho, Gary Pass, John Williams, Jose San Leandro, Laurence Giglio, Manish Jain, Maxim Volgin, Michael Wang, Mike Metzger, Piti Champeethong, Prashant Dwivedi, Radhika Kanubaddhi, Rajat Kant Goel, Ramaa Vissa, Richard Vaughan, Satej Kumar Sahu, Sergio Gtz, Siva Dhandapani, Annamaneni Sriharsha, Sri Ram Macharla, Sumit Bhattacharyya, Tony Holdroyd, Vidal Graupera, Vidhya Vinay, and Vinoth Nageshwaran. Your suggestions helped make this a better book.

关于这本书

about this book

《人工智能代理实战》一书探讨了如何构建和使用智能代理系统——不仅包括创建自主实体,还包括开发能够有效应对和解决现实世界问题的代理。本书首先介绍如何使用大型语言模型(LLM)构建助手、多代理系统和智能行为代理。在此基础上,本书深入探讨了代理系统的关键组成部分:用于知识和记忆增强的检索系统、动作和工具的使用、推理、规划、评估和反馈。本书通过实际案例展示了这些组成部分如何赋予代理执行各种复杂任务的能力。

AI Agents in Action is about building and working with intelligent agent systems—not just creating autonomous entities but also developing agents that can effectively tackle and solve real-world problems. The book starts with the basics of working with large language models (LLMs) to build assistants, multi-agent systems, and agentic behavioral agents. From there, it explores the key components of agentic systems: retrieval systems for knowledge and memory augmentation, action and tool usage, reasoning, planning, evaluation, and feedback. The book demonstrates how these components empower agents to perform a wide range of complex tasks through practical examples.

这段旅程不仅仅关乎技术,更关乎我们重新构想解决问题的方式。我希望这本书能启发你将智能体视为创新伙伴,它们能够以过去认为不可能的方式将想法转化为行动。我们将共同探索人工智能如何增强人类潜能,使我们能够取得远超单打独斗所能达到的成就。

This journey isn’t just about technology; it’s about reimagining how we approach problem solving. I hope this book inspires you to see intelligent agents as partners in innovation, capable of transforming ideas into actions in ways that were once thought impossible. Together, we’ll explore how AI can augment human potential, enabling us to achieve far more than we could alone.

哪些人应该读这本书?

Who should read this book

本书面向所有对智能体及其系统开发感兴趣的读者——无论你是正在构建第一个实用助手,还是深入研究复杂的多智能体系统。无需任何智能体、智能体系统、提示工程或LLM(逻辑逻辑模型)方面的经验。你只需要具备Python基础知识和对GitHub代码库的熟悉即可。我的目标是让这些概念通俗易懂、引人入胜,使任何想要探索人工智能智能体世界的人都能充满信心地迈出第一步。

This book is for anyone curious about intelligent agents and how to develop agentic systems—whether you’re building your first helpful assistant or diving deeper into complex multi-agent systems. No prior experience with agents, agentic systems, prompt engineering, or working with LLMs is required. All you need is a basic understanding of Python and familiarity with GitHub repositories. My goal is to make these concepts accessible and engaging, empowering anyone who wants to explore the world of AI agents to do so with confidence.

无论您是开发者、研究人员、业余爱好者,还是仅仅对人工智能的无限可能充满好奇,本书都将为您带来启发。我希望您能从本书中获得灵感、实用指导,并对智能体的巨大潜力有全新的认识。本书将引导您理解、创建并释放人工智能体在实际应用中的强大力量。

Whether you’re a developer, researcher, or hobbyist or are simply intrigued by the possibilities of AI, this book is for you. I hope that in these pages you’ll find inspiration, practical guidance, and a new appreciation for the remarkable potential of intelligent agents. Let this book guide understanding, creating, and unleashing the power of AI agents in action.

本书的结构安排:一份路线图

How this book is organized: A road map

本书共11章。第一章“智能体及其世界简介”首先奠定基础,给出了大型语言模型、聊天系统、助手和自主智能体的基本定义。随着本书的展开,讨论逐渐转向构成智能体的关键组件,以及这些组件如何协同工作以创建真正有效的系统。以下是第二章至第十一章的简要概述:

This book has 11 chapters. Chapter 1, “Introduction to agents and their world,” begins by laying a foundation with fundamental definitions of large language models, chat systems, assistants, and autonomous agents. As the book progresses, the discussion shifts to the key components that make up an agent and how these components work together to create truly effective systems. Here is a quick summary of chapters 2 through 11:

  • 第二章,“驾驭大型语言模型的力量” ——我们首先探讨如何使用商业语言模型,例如 OpenAI。然后,我们研究诸如 LM Studio 之类的工具,这些工具为运行各种开源语言模型提供基础设施和支持,使任何人都能进行实验和创新。
  • Chapter 2, “Harnessing the power of large language models”—We start by exploring how to use commercial LLMs, such as OpenAI. We then examine tools, such as LM Studio, that provide the infrastructure and support for running various open source LLMs, enabling anyone to experiment and innovate.
  • 第三章,“GPT助手的开发”  ——本章深入探讨了OpenAI的GPT助手平台的功能。助手是基础的智能体类型,我们将探索如何创建实用且多样化的助手,从烹饪助手到实习数据科学家,甚至是阅读助手。
  • Chapter 3, “Engaging GPT assistants” —This chapter dives into the capabilities of the GPT Assistants platform from OpenAI. Assistants are foundational agent types, and we explore how to create practical and diverse assistants, from culinary helpers to intern data scientists and even a book learning assistant.
  • 第四章,“探索多智能体系统”  ——智能体工具发展迅猛。本章我们将探索两个复杂的多智能体系统:CrewAI 和 AutoGen。我们将展示 AutoGen 自主生成代码的能力,并了解 CrewAI 如何将一群笑话研究人员聚集在一起,共同创作幽默作品。
  • Chapter 4, “Exploring multi-agent systems” —Agentic tools have advanced significantly quickly. Here, we explore two sophisticated multi-agent systems: CrewAI and AutoGen. We demonstrate AutoGen’s ability to develop code autonomously and see how CrewAI can bring together a group of joke researchers to create humor collaboratively.
  • 第五章,“赋予智能体行动能力”  ——行动是任何智能体系统的基础。本章探讨智能体如何使用工具和函数来执行行动,包括数据库和应用程序编程接口 (API) 查询以及图像生成等。我们重点关注如何使智能体能够自主地采取有意义的行动。
  • Chapter 5, “Empowering agents with actions” —Actions are fundamental to any agentic system. This chapter discusses how agents can use tools and functions to execute actions, ranging from database and application programming interface (API) queries to generating images. We focus on enabling agents to take meaningful actions autonomously.
  • 第六章“构建自主助手”  ——我们将探索行为树——机器人和游戏系统中的基础技术——作为协调多个智能体的机制。我们将使用行为树来应对诸如编程竞赛和社交媒体内容创作等挑战。
  • Chapter 6, “Building autonomous assistants” —We explore the behavior tree—a staple in robotics and game systems—as a mechanism to orchestrate multiple coordinated agents. We’ll use behavior trees to tackle challenges such as code competitions and social media content creation.
  • 第七章,“组装和使用代理平台”  ——本章介绍 Nexus,这是一个用于协调多个代理和 LLM 的复杂平台。我们将讨论 Nexus 如何促进代理工作流程并实现代理之间复杂的交互,并提供一个功能齐全的多代理环境示例。
  • Chapter 7, “Assembling and using an agent platform” —This chapter introduces Nexus, a sophisticated platform for orchestrating multiple agents and LLMs. We discuss how Nexus facilitates agentic workflows and enables complex interactions between agents, providing an example of a fully functioning multi-agent environment.
  • 第八章,“理解智能体的记忆和知识”  ——检索增强生成(RAG)已成为扩展LLM智能体能力的重要工具。本章探讨了检索机制如何既能通过处理已摄入的文件作为知识来源,又能作为记忆来源,使智能体能够回忆起之前的交互或事件。
  • Chapter 8, “Understanding agent memory and knowledge” —Retrieval-augmented generation (RAG) has become an essential tool for extending the capabilities of LLM agents. This chapter explores how retrieval mechanisms can serve as both a source of knowledge by processing ingested files, and of memory, allowing agents to recall previous interactions or events.
  • 第九章,“利用提示流掌握智能体提示”  ——提示工程是智能体成功的关键。本章将介绍提示流,这是微软推出的一款工具,可帮助自动化提示的测试和评估,从而实现更强大、更高效的智能体行为。
  • Chapter 9, “Mastering agent prompts with prompt flow” —Prompt engineering is central to an agent’s success. This chapter introduces prompt flow, a tool from Microsoft that helps automate the testing and evaluation of prompts, enabling more robust and effective agentic behavior.
  • 第十章,“智能体推理与评估” ——推理对于智能地解决问题至关重要。本章将探讨各种推理技巧,例如思维链(CoT),并展示智能体如何在推理过程中评估推理策略,从而提高其自主解决问题的能力。
  • Chapter 10, “Agent reasoning and evaluation ”—Reasoning is crucial to solving problems intelligently. In this chapter, we explore various reasoning techniques, such as chain of thought (CoT), and show how agents can evaluate reasoning strategies even during inference, improving their capacity to solve problems autonomously.
  • 第十一章“智能体规划与反馈”  ——规划或许是智能体实现目标最关键的技能。本章探讨智能体如何运用规划来应对复杂任务,以及如何利用反馈回路来完善规划。最后,本章将所有关键要素——行动、记忆与知识、推理、评估、规划和反馈——整合到解决现实世界问题的智能体系统实例中。
  • Chapter 11, “Agent planning and feedback” —Planning is perhaps an agent’s most critical skill in achieving its goals. We discuss how agents can incorporate planning to navigate complex tasks and how feedback loops can be used to refine those plans. The chapter concludes by integrating all the key components—actions, memory and knowledge, reasoning, evaluation, planning, and feedback—into practical examples of agentic systems that solve real-world problems.

关于代码

About the code

本书的代码分散在多个开源项目中,其中许多项目由我或其他组织托管在 GitHub 代码库中。本书力求使内容尽可能易于理解,采用低代码方法,帮助读者专注于核心概念。许多章节演示了如何通过简单的提示生成有意义的代码,展现了人工智能辅助开发的强大功能。

The code for this book is spread across several open source projects, many of which are hosted by me or by other organizations in GitHub repositories. Throughout this book, I strive to make the content as accessible as possible, taking a low-code approach to help you focus on core concepts. Many chapters demonstrate how simple prompts can generate meaningful code, showcasing the power of AI-assisted development.

此外,您还会发现各种助手配置文件和多智能体系统,它们演示了如何使用生成的代码解决实际问题。这些示例旨在启发、指导并赋能您探索人工智能智能体的无限可能。我衷心感谢众多贡献者和社区成员为这些项目所做的合作,并鼓励您探索代码库、尝试运行代码,并根据自身需求进行调整。本书充分展现了合作的力量以及我们携手所能取得的卓越成就。

Additionally, you’ll find a variety of assistant profiles and multi-agent systems that demonstrate how to solve real-world problems using generated code. These examples are meant to inspire, guide, and empower you to explore what is possible with AI agents. I am deeply grateful to the many contributors and the community members who have collaborated on these projects, and I encourage you to explore the repositories, experiment with the code, and adapt it to your own needs. This book is a testament to the power of collaboration and the incredible things we can achieve together.

本书包含大量源代码示例,既有编号列表,也有与正文并列的源代码。无论哪种形式,源代码都采用特殊格式,fixed-width font like this与普通文本区分开来。有时,我们会对部分代码进行排版,以突出显示本章中与前文步骤相比发生更改的代码,例如在现有代码行中添加新功能时。在许多情况下,我们对原始源代码进行了重新格式化;我们添加了换行符并调整了缩进,以适应本书的可用页面空间。在某些情况下,即使这样也不够,因此我们在代码列表中添加了行继续标记()。此外,当代码在正文中已有描述时,我们通常会从代码列表中移除源代码中的注释。许多代码列表都附有代码注释,以突出显示重要的概念。in bold

This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes, some of the code is typeset in bold to highlight code that has changed from previous steps in the chapter, such as when a feature is added to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In some cases, even this wasn’t enough, and listings include line-continuation markers (). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.

您可以从本书的在线版本(https://livebook.manning.com/book/ai-agents-in-action )获取可执行代码片段。本书示例的完整代码可从 Manning 网站(www.manning.com/books/ai-agents-in-action )下载。此外,本书开发的代码已发布在三个 GitHub 代码库中,所有代码库均可公开访问:

You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/ai-agents-in-action. The complete code for the examples in the book is available for download from the Manning website at www.manning.com/books/ai-agents-in-action. In addition, the code developed for this book has been placed in three GitHub repositories that are all publicly accessible:

liveBook 讨论论坛

liveBook discussion forum

购买《AI Agents in Action》即可免费访问 Manning 的在线阅读平台 liveBook。借助 liveBook 独有的讨论功能,您可以对整本书、特定章节或段落发表评论。您可以轻松地做笔记、提出和回答技术问题,并获得作者和其他用户的帮助。要访问论坛,请访问https://livebook.manning.com/book/ai-agents-in-action/discussion 。您还可以访问https://livebook.manning.com/discussion了解更多关于 Manning 论坛和行为准则的信息。

Purchase of AI Agents in Action includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/ai-agents-in-action/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion.

曼宁出版社致力于为读者提供一个平台,让读者之间以及读者与作者之间能够进行有意义的对话。这并不意味着作者必须参与论坛讨论,作者的贡献完全出于自愿(且无偿)。我们建议您不妨向他提出一些有挑战性的问题,以免他失去兴趣!只要本书仍在印刷发行,您就可以通过出版社网站访问论坛和以往讨论的存档。

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It isn’t a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the him challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

关于作者

about the author

数字

迈克尔·兰汉姆是一位杰出的软件和技术创新者,拥有超过二十年的行业经验。他在多个领域拥有丰富的软件应用开发经验,包括游戏、图形、Web开发、桌面工程、人工智能、地理信息系统、油气地球科学/地质力学以及机器学习。迈克尔的职业生涯始于千禧年之交,他率先将神经网络和进化算法集成到游戏开发中。他撰写了多部极具影响力的著作,探讨深度学习、游戏开发和增强现实技术,其中包括《进化深度学习》(Manning出版社,2023年)和《增强现实游戏开发》(Packt出版社,2017年)。他通过与包括Manning在内的多家知名科技出版社合作发表文章,为科技界做出了贡献。迈克尔与他的大家庭居住在加拿大阿尔伯塔省卡尔加里市,他很喜欢为家人烹饪美食。

Micheal Lanham is a distinguished software and technology innovator with more than two decades of experience in the industry. He has an extensive background in developing various software applications across several domains, such as gaming, graphics, web development, desktop engineering, AI, GIS, oil and gas geoscience/geomechanics, and machine learning. Micheal began by pioneering work in integrating neural networks and evolutionary algorithms into game development, which began around the turn of the millennium. He has authored multiple influential books exploring deep learning, game development, and augmented reality, including Evolutionary Deep Learning (Manning, 2023) and Augmented Reality Game Development (Packt Publishing, 2017). He has contributed to the tech community via publications with many significant tech publishers, including Manning. Micheal resides in Calgary, Alberta, Canada, with his large family, whom he enjoys cooking for.

关于封面插图

about the cover illustration

《AI Agents in Action》封面上的人物是“Clémentinien”,摘自 Balthasar Hacquet于 1815 年出版的《Illyrie et la Dalmatie 插图》

The figure on the cover of AI Agents in Action is “Clémentinien,” taken from Balthasar Hacquet’s Illustrations de L’Illyrie et la Dalmatie, published in 1815.

在那个年代,人们仅凭衣着就能轻易辨别居住地、职业和社会地位。曼宁以几个世纪前丰富多彩的地域文化为灵感,设计书籍封面,以此赞颂计算机行业的创造力和进取精神,并借助诸如此类的藏品图片,让这些文化重现生机。

In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.

1 代理人及其世界简介

1 Introduction to agents and their world

本章内容

This chapter covers

  • 定义代理人的概念
  • Defining the concept of agents
  • 区分代理的组成部分
  • Differentiating the components of an agent
  • 分析代理人时代的兴起:为什么需要代理人?
  • Analyzing the rise of the agent era: Why agents?
  • 揭开人工智能界面
  • Peeling back the AI interface
  • 探索代理格局
  • Navigating the agent landscape

在机器学习和人工智能(AI)领域,“代理”并非一个新概念。例如,在强化学习中,“代理”一词指的是具有主动决策和学习能力的智能体。在其他领域,“代理”一词更多地指能够代表用户执行某些操作的自动化应用程序或软件。

The agent isn’t a new concept in machine learning and artificial intelligence (AI). In reinforcement learning, for instance, the word agent denotes an active decision-making and learning intelligence. In other areas, the word agent aligns more with an automated application or software that does something on your behalf.

1.1定义代理

1.1 Defining agents

你可以查阅任何在线词典来查找“代理人”的定义。韦氏词典是这样定义的(www.merriam-webster.com/dictionary/agent):

You can consult any online dictionary to find the definition of an agent. The Merriam-Webster Dictionary defines it this way (www.merriam-webster.com/dictionary/agent):

  • 行使权力的人
  • One that acts or exerts power
  • 能够产生或可能产生效果的事物
  • Something that produces or can produce an effect
  • 指导性智能实现结果的手段或工具
  • A means or instrument by which a guiding intelligence achieves a result

本书中,我们构建强大智能体的旅程中,“智能体” (agent)一词采用的是词典定义。这意味着“助手” (assert)一词也将与“智能体”同义像 OpenAI 的 GPT 助手这样的工具也属于人工智能智能体的范畴。OpenAI 之所以避免使用“智能体”一词,是因为在机器学习的历史中,智能体通常是指能够自主决策和独立运作的个体。

The word agent in our journey to build powerful agents in this book uses this dictionary definition. That also means the term assistant will be synonymous with agent. Tools like OpenAI’s GPT Assistants will also fall under the AI agent blanket. OpenAI avoids the word agent because of the history of machine learning, where an agent is self-deciding and autonomous.

图 1.1 展示了用户可以直接或通过代理/助手、代理/助手或自主代理与大型语言模型 (LLM) 交互的四种场景。以下列表将更详细地介绍这四种用例:

Figure 1.1 shows four cases where a user may interact with a large language model (LLM) directly or through an agent/assistant proxy, an agent/assistant, or an autonomous agent. These four use cases are highlighted in more detail in this list:

  • 直接用户交互 ——如果您使用过早期版本的 ChatGPT,您体验的是与 LLM 的直接交互。没有代理或其他助手代表您进行干预。
  • Direct user interaction —If you used earlier versions of ChatGPT, you experienced direct interaction with the LLM. There is no proxy agent or other assistant interjecting on your behalf.
  • 代理/助手代理 ——如果您曾通过 ChatGPT 使用过 Dall-E 3,那么您就体验过代理代理交互。在这种情况下,LLM(逻辑逻辑模型)会介入您的请求,并将其重新表述为更适合该任务的格式。例如,对于图像生成,ChatGPT 可以更好地生成提示。代理代理是一种日常应用场景,可以帮助用户处理不熟悉的任务或模型。
  • Agent/assistant proxy —If you’ve used Dall-E 3 through ChatGPT, then you’ve experienced a proxy agent interaction. In this use case, an LLM interjects your requests and reformulates them in a format better designed for the task. For example, for image generation, ChatGPT better formulates the prompt. A proxy agent is an everyday use case to assist users with unfamiliar tasks or models.
  • 代理/助手 ——如果您曾经使用过 ChatGPT 插件或 GPT 助手,那么您就体验过这种使用场景。在这种情况下,LLM 会识别插件或助手的功能,并准备调用这些插件/功能。但是,在调用之前,LLM 需要用户批准。如果获得批准,插件或功能将被执行,并将结果返回给 LLM。然后,LLM 会将此响应包装成自然语言,并将其返回给用户。
  • Agent/assistant —If you’ve ever used a ChatGPT plugin or GPT assistant, then you’ve experienced this use case. In this case, the LLM is aware of the plugin or assistant functions and prepares to make calls to this plugin/function. However, before making a call, the LLM requires user approval. If approved, the plugin or function is executed, and the results are returned to the LLM. The LLM then wraps this response in natural language and returns it to the user.
  • 自主代理 ——在这种用例中,代理会解读用户的请求,构建计划,并识别决策点。然后,它会独立执行计划中的步骤并做出必要的决策。代理可能会在完成某些里程碑任务后请求用户反馈,但通常会被赋予充分的自主权去探索和学习。这种代理引发了最多的伦理和安全问题,我们将在后面进行探讨。
  • Autonomous agent —In this use case, the agent interprets the user’s request, constructs a plan, and identifies decision points. From this, it executes the steps in the plan and makes the required decisions independently. The agent may request user feedback after certain milestone tasks, but it’s often given free rein to explore and learn if possible. This agent poses the most ethical and safety concerns, which we’ll explore later.
数字
图 1.1直接行动与使用代理、代理人和自主代理人的 LLM 交互之间的差异

图 1.1 展示了使用单个智能体在 LLM 上执行单个操作流程的用例。对于更复杂的问题,我们通常会将智能体拆分为不同的配置文件或角色。每个智能体配置文件都被赋予特定的任务,并使用专门的工具和知识来执行该任务。

Figure 1.1 demonstrates the use cases for a single flow of actions on an LLM using a single agent. For more complex problems, we often break agents into profiles or personas. Each agent profile is given a specific task and executes that task with specialized tools and knowledge.

多智能体系统是由多个智能体以各种配置协同工作以解决问题的智能体配置文件组成。图 1.2 展示了一个使用三个智能体的多智能体系统示例:一个控制器或代理,以及两个由该代理控制的配置文件智能体(作为工作智能体)。左侧的编码智能体配置文件负责编写用户请求的代码;右侧的测试智能体配置文件负责编写单元测试。这些智能体协同工作和通信,直到它们对代码满意为止,然后将代码交付给用户。

Multi-agent systems are agent profiles that work together in various configurations to solve a problem. Figure 1.2 demonstrates an example of a multi-agent system using three agents: a controller or proxy and two profile agents as workers controlled by the proxy. The coder profile on the left writes the code the user requests; on the right is a tester profile designed to write unit tests. These agents work and communicate together until they are happy with the code and then pass it on to the user.

图 1.2 展示了一种可能的无限多代理配置。(在第 4 章中,我们将探讨微软的开源平台 AutoGen,该平台支持多种配置以部署多代理系统。)

Figure 1.2 shows one of the possibly infinite agent configurations. (In chapter 4, we’ll explore Microsoft’s open source platform, AutoGen, which supports multiple configurations for employing multi-agent systems.)

数字
图 1.2在这个多智能体系统的示例中,控制器或代理智能体直接与用户通信。两个智能体——程序员和测试员——在后台工作,负责编写代码和单元测试来测试代码。

多智能体系统既可以自主运行,也可以完全依靠人类反馈进行指导。使用多个智能体的优势与单个智能体类似,但通常会更加显著。单个智能体通常专注于单一任务,而多智能体系统可以并行处理多个任务。此外,多个智能体还可以提供反馈和评估,从而减少任务完成过程中的错误。

Multi-agent systems can work autonomously but may also function guided entirely by human feedback. The benefits of using multiple agents are like those of a single agent but often magnified. Where a single agent typically specializes in a single task, multi-agent systems can tackle multiple tasks in parallel. Multiple agents can also provide feedback and evaluation, reducing errors when completing assignments.

正如我们所见,人工智能代理或代理系统可以通过多种方式构建。然而,代理本身也可以由多个组件构成。下一节我们将探讨代理的各种特性,包括其概况、可执行的操作以及记忆和规划能力。

As we can see, an AI agent or agent system can be assembled in multiple ways. However, an agent itself can also be assembled using multiple components. In the next section, we’ll cover topics ranging from an agent’s profile to the actions it may perform, as well as memory and planning.

1.2理解代理的组成系统

1.2 Understanding the component systems of an agent

智能体可以是包含多个组件系统的复杂单元。这些组件是智能体用来帮助其完成目标或指定任务,甚至创建新任务的工具。组件可以是简单系统,也可以是复杂系统,通常分为五类。

Agents can be complex units composed of multiple component systems. These components are the tools the agent employs to help it complete its goal or assigned tasks and even create new ones. Components may be simple or complex systems, typically split into five categories.

图 1.3 描述了单智能体系统可能包含的主要组件类别。每个组件都有子类型,这些子类型可以定义组件的类型、结构和用途。所有智能体的核心是其配置文件和角色;在此基础上,系统和功能不断扩展,以增强智能体的功能。

Figure 1.3 describes the major categories of components a single-agent system may incorporate. Each element will have subtypes that can define the component’s type, structure, and use. At the core of all agents is the profile and persona; extending from that are the systems and functions that enhance the agent.

数字
图 1.3单智能体系统的五个主要组成部分(图像由 DALL-E 3 生成)

图 1.4 所示的智能体概况和角色信息代表了智能体的基本描述。角色信息(通常称为系统提示) 指导智能体完成任务、学习如何响应以及其他细微差别。它包含背景(例如,程序员、作家)和人口统计信息等元素,并且可以通过手工设计、LLM 辅助或数据驱动技术(包括进化算法)等方法生成。

The agent profile and persona shown in figure 1.4 represent the base description of the agent. The persona—often called the system prompt —guides an agent to complete tasks, learn how to respond, and other nuances. It includes elements such as the background (e.g., coder, writer) and demographics, and it can be generated through methods such as handcrafting, LLM assistance, or data-driven techniques, including evolutionary algorithms.

数字
图 1.4深入探讨我们将如何探索创建代理人档案

我们将探讨如何运用诸如评估标准和基础分析等技巧,创建有效且具体的代理档案/角色画像。此外,我们还将阐述人工构建的代理档案与人工智能构建的代理档案(LLM)之间的区别,包括利用数据和进化算法构建代理档案的创新技术。

We’ll explore how to create effective and specific agent profiles/personas through techniques such as rubrics and grounding. In addition, we’ll explain the aspects of human-formulated versus AI-formulated (LLM) profiles, including innovative techniques using data and evolutionary algorithms to build profiles.

图 1.5 展示了智能体在完成任务或获取信息的过程中,其组成动作和工具的使用情况。这些动作可分为任务完成、探索和通信三大类,它们对智能体的环境和内部状态的影响程度各不相同。动作可以手动生成、通过记忆回忆生成,也可以按照预定义的计划生成,这些动作会影响智能体的行为并增强其学习能力。

Figure 1.5 demonstrates the component actions and tool use in the context of agents involving activities directed toward task completion or acquiring information. These actions can be categorized into task completion, exploration, and communication, with varying levels of effect on the agent’s environment and internal states. Actions can be generated manually, through memory recollection, or by following predefined plans, influencing the agent’s behavior and enhancing learning.

数字
图 1.5本书将探讨的智能体行为的各个方面

理解行动目标有助于我们明确任务完成、探索或沟通的目标。认识到行动效果则揭示了行动如何影响任务结果、智能体的环境及其内部状态,从而有助于高效决策。最后,掌握行动生成方法使我们能够手动创建行动、从记忆中回忆行动或遵循预定义的计划,从而增强我们有效塑造智能体行为和学习过程的能力。

Understanding the action target helps us define clear objectives for task completion, exploration, or communication. Recognizing the action effect reveals how actions influence task outcomes, the agent’s environment, and its internal states, contributing to efficient decision making. Lastly, grasping action generation methods equips us with the knowledge to create actions manually, recall them from memory, or follow predefined plans, enhancing our ability to effectively shape agent behavior and learning processes.

图 1.6 更详细地展示了知识和记忆的组成。智能体利用知识和记忆,在限制词元数量的同时,用最相关的信息标注上下文。知识和记忆结构可以统一,两者可以遵循单一结构,也可以采用混合结构,即结合不同的检索方式。知识和记忆格式可以非常多样化,从语言(例如 PDF 文档)到数据库(关系型、对象型或文档型)以及嵌入,通过向量表示甚至简单的列表作为智能体的记忆,简化语义相似性搜索。

Figure 1.6 shows the component knowledge and memory in more detail. Agents use knowledge and memory to annotate context with the most pertinent information while limiting the number of tokens used. Knowledge and memory structures can be unified, where both subsets follow a single structure or hybrid structure involving a mix of different retrieval forms. Knowledge and memory formats can vary widely from language (e.g., PDF documents) to databases (relational, object, or document) and embeddings, simplifying semantic similarity search through vector representations or even simple lists serving as agent memories.

数字
图 1.6探索智能体记忆和知识的作用和用途

图 1.7 展示了智能体系统的推理和评估组件。研究和实际应用表明,LLM/智能体能够有效地进行推理。推理和评估系统通过赋予智能体思考问题和评估解决方案的能力,来标注智能体的工作流程。

Figure 1.7 shows the reasoning and evaluation component of an agent system. Research and practical applications have shown that LLMs/agents can effectively reason. Reasoning and evaluation systems annotate an agent’s workflow by providing an ability to think through problems and evaluate solutions.

数字
图 1.7推理和评估组件及细节

图 1.8 展示了组件代理的规划/反馈及其在组织任务以实现更高层次目标中的作用。它可以分为以下两种方法:

Figure 1.8 shows the component agent planning/feedback and its role in organizing tasks to achieve higher-level goals. It can be categorized into these two approaches:

  • 无反馈规划 ——自主代理独立做出决策。
  • Planning without feedback —Autonomous agents make decisions independently.
  • 规划与反馈 ——计划的监控和修改基于各种输入来源,包括环境变化和直接的人类反馈。
  • Planning with feedback —Monitoring and modifying plans is based on various sources of input, including environmental changes and direct human feedback.
数字
图 1.8探索智能体规划和推理的作用

在规划过程中,智能体可以采用单路径推理、按步骤顺序推理,或者采用多路径推理来探索多种策略,并将高效的策略保存下来以供将来使用。外部规划器(可以是代码或其他智能体系统)也可以在规划过程中发挥作用。

Within planning, agents may employ single-path reasoning, sequential reasoning through each step of a task, or multipath reasoning to explore multiple strategies and save the efficient ones for future use. External planners, which can be code or other agent systems, may also play a role in orchestrating plans.

我们之前提到的任何一种代理类型——代理/助手、代理/助手或自主代理——都可以使用这些组件中的部分或全部。即使是规划组件,在自主代理之外也发挥着作用,并且可以有效地增强普通代理的功能。

Any of our previous agent types—the proxy agent/assistant, agent/assistant, or autonomous agent—may use some or all of these components. Even the planning component has a role outside of the autonomous agent and can effectively empower even the regular agent.

1.3探究代理人时代的兴起:为什么是代理人?

1.3 Examining the rise of the agent era: Why agents?

人工智能代理和助手已迅速从人工智能研究的主要领域发展成为主流软件开发。越来越多的工具和平台助力构建和增强代理的功能。在局外人看来,这一切或许只是炒作,旨在抬高某些炫酷但被过分吹捧的技术的价值。

AI agents and assistants have quickly moved from the main commodity in AI research to mainstream software development. An ever-growing list of tools and platforms assist in the construction and empowerment of agents. To an outsider, it may all seem like hype intended to inflate the value of some cool but overrated technology.

在 ChatGPT 最初发布后的几个月里,一种名为“提示工程”的新领域应运而生:用户发现,通过在提示中使用各种技巧和模式,可以生成更好、更稳定的输出。然而,用户也意识到,提示工程的局限性也显而易见。

During the first few months after ChatGPT’s initial release, a new discipline called prompt engineering was formed: users found that using various techniques and patterns in their prompts allowed them to generate better and more consistent output. However, users also realized that prompt engineering could only go so far.

提示工程仍然是与 ChatGPT 等语言学习模型 (LLM) 直接交互的绝佳方式。随着时间的推移,许多用户发现,有效的提示需要迭代、反思以及更多的迭代。AutoGPT 等首批智能体系统正是在这些发现的基础上应运而生,并迅速引起了社区的关注。

Prompt engineering is still an excellent way to interact directly with LLMs such as ChatGPT. Over time, many users discovered that effective prompting required iteration, reflection, and more iteration. The first agent systems, such as AutoGPT, emerged from these discoveries, capturing the community’s attention.

图 1.9 展示了 AutoGPT 的原始设计,它是最早的自主代理系统之一。该代理的设计目标是迭代执行一系列预先规划好的任务,这些任务是根据用户的目标制定的。在每次任务迭代过程中,代理都会评估目标并判断任务是否完成。如果任务未完成,代理可能会根据新的知识或用户反馈重新规划步骤并更新计划。

Figure 1.9 shows the original design of AutoGPT, one of the first autonomous agent systems. The agent is designed to iterate a planned sequence of tasks that it defines by looking at the user’s goal. Through each task iteration of steps, the agent evaluates the goal and determines if the task is complete. If the task isn’t complete, the agent may replan the steps and update the plan based on new knowledge or human feedback.

数字
图 1.9 AutoGPT 代理系统的原始设计

AutoGPT 是首个展示任务规划和迭代在 LLM 模型中强大作用的案例。在此基础上,其他智能体系统和框架如雨后春笋般涌现,它们也采用了类似的规划和任务迭代机制。人们普遍认为,规划、迭代和重复是解决 LLM 复杂且多方面目标的最佳方法。

AutoGPT became the first example to demonstrate the power of using task planning and iteration with LLM models. From this and in tandem, other agent systems and frameworks exploded into the community using similar planning and task iteration systems. It’s generally accepted that planning, iteration, and repetition are the best processes for solving complex and multifaceted goals for an LLM.

然而,自主代理系统需要对代理的决策过程、防护机制/评估系统以及目标定义建立信任。信任也是需要时间积累的。我们缺乏信任源于我们对自主代理能力的理解不足。

However, autonomous agent systems require trust in the agent decision-making process, the guardrails/evaluation system, and the goal definition. Trust is also something that is acquired over time. Our lack of trust stems from our lack of understanding of an autonomous agent’s capabilities.

因此,许多主流且可用于生产环境的智能体工具并非自主的。然而,它们在利用广义物理理论(GPT)和逻辑学习模型(LLM)管理和自动化任务方面仍然具有显著优势。鉴于本书的目标是理解所有类型的智能体,许多实际应用仍将由非自主智能体驱动。

For this reason, many of the mainstream and production-ready agent tools aren’t autonomous. However, they still provide a significant benefit in managing and automating tasks using GPTs (LLMs). Therefore, as our goal in this book is to understand all agent forms, many more practical applications will be driven by non-autonomous agents.

代理和代理工具只是新型软件应用开发范式的顶层。我们将在下一节中探讨这种新范式。

Agents and agent tools are only the top layer of a new software application development paradigm. We’ll look at this new paradigm in the next section.

1.4揭开人工智能界面

1.4 Peeling back the AI interface

人工智能代理范式不仅改变了我们与生命周期管理(LLM)的协作方式,也被视为我们开发软件和处理数据方式的变革。软件和数据将不再使用用户界面(UI)、应用程序编程接口(API)和诸如SQL之类的专用查询语言进行交互,而是采用自然语言进行交互。

The AI agent paradigm is not only a shift in how we work with LLMs but is also perceived as a shift in how we develop software and handle data. Software and data will no longer be interfaced using user interfaces (UIs), application programming interfaces (APIs), and specialized query languages such as SQL. Instead, they will be designed to be interfaced using natural language.

图 1.10 展示了这种新架构的概览,以及人工智能代理在其中扮演的角色。数据、软件和应用程序都将进行调整,以支持语义化的自然语言界面。这些人工智能界面允许代理收集数据并与软件应用程序交互,甚至可以与其他代理或代理应用程序交互。这标志着我们与软件和应用程序交互方式的一次全新转变。

Figure 1.10 shows a high-level snapshot of what this new architecture may look like and what role AI agents play. Data, software, and applications adapt to support semantic, natural language interfaces. These AI interfaces allow agents to collect data and interact with software applications, even other agents or agent applications. This represents a new shift in how we interact with software and applications.

数字
图 1.10代理与软件系统交互方式的设想

人工智能界面是一系列功能、工具和数据层的集合,它通过自然语言展现数据和应用程序。过去,“语义”一词被广泛用于描述这类界面,甚至一些工具也使用这个名称;然而,“语义”一词本身也包含多种含义和用法。因此,本书将使用“人工智能界面”这一术语。

An AI interface is a collection of functions, tools, and data layers that expose data and applications by natural language. In the past, the word semantic has been heavily used to describe these interfaces, and even some tools use the name; however, “semantic” can also have a variety of meanings and uses. Therefore, in this book, we’ll use the term AI interface.

人工智能界面的构建将赋予需要使用服务、工具和数据的代理更多能力。这种能力的提升将带来任务完成的准确性提高,以及更值得信赖和更自主的应用程序。虽然人工智能界面可能并不适用于所有软件和数据,但它将在许多应用场景中占据主导地位。

The construction of AI interfaces will empower agents that need to consume the services, tools, and data. With this empowerment will come increased accuracy in completing tasks and more trustworthy and autonomous applications. While an AI interface may not be appropriate for all software and data, it will dominate many use cases.

1.5探索代理领域

1.5 Navigating the agent landscape

GPT智能体代表着消费者和开发者处理一切事务方式的彻底转变,从信息查找到软件构建再到数据访问,无所不包。几乎每天都有新的智能体框架、组件或界面出现在GitHub或研究论文中。这对于试图理解智能体系统及其使用方法的新用户来说,可能会让人感到不知所措,甚至望而却步。

GPT agents represent an entire shift in how consumers and developers approach everything, from finding information to building software and accessing data. Almost daily, a new agent framework, component, or interface pops up on GitHub or in a research paper. This can be overwhelming and intimidating to the new user trying to grasp what agent systems are and how to use them.

概括

Summary

  • 智能体是指能够行动或行使权力、产生影响或作为实现结果手段的实体。在人工智能中,智能体可以自动与大型语言模型(LLM)进行交互。
  • An agent is an entity that acts or exerts power, produces an effect, or serves as a means for achieving a result. An agent automates interaction with a large language model (LLM) in AI.
  • 助手与智能体是同义词。这两个术语都涵盖了诸如 OpenAI 的 GPT 助手之类的工具。
  • An assistant is synonymous with an agent. Both terms encompass tools such as OpenAI’s GPT Assistants.
  • 自主代理能够独立做出决策,它们与非自主代理的区别至关重要。
  • Autonomous agents can make independent decisions, and their distinction from non-autonomous agents is crucial.
  • LLM交互的四种主要类型包括直接用户交互、代理/助手代理、代理/助手和自主代理。
  • The four main types of LLM interactions include direct user interaction, agent/ assistant proxy, agent/assistant, and autonomous agent.
  • 多智能体系统涉及多个智能体协同工作,通常由代理控制,以完成复杂的任务。
  • Multi-agent systems involve agent profiles working together, often controlled by a proxy, to accomplish complex tasks.
  • 智能体的主要组成部分包括:角色/人物、行动、知识/记忆、推理/评估和计划/反馈。
  • The main components of an agent include the profile/persona, actions, knowledge/memory, reasoning/evaluation, and planning/feedback.
  • 经纪人简介和角色设定指导经纪人的任务、回应和其他细节,通常包括背景和人口统计信息。
  • Agent profiles and personas guide an agent’s tasks, responses, and other nuances, often including background and demographics.
  • 代理的操作和工具可以手动生成、从记忆中调用,或者遵循预定义的计划。
  • Actions and tools for agents can be manually generated, recalled from memory, or follow predefined plans.
  • 代理利用知识和记忆结构,通过从文档到嵌入等各种格式来优化上下文并最大限度地减少令牌的使用。
  • Agents use knowledge and memory structures to optimize context and minimize token usage via various formats, from documents to embeddings.
  • 推理和评估系统使智能体能够通过零样本、单样本和少样本等提示模式来思考问题并评估解决方案。
  • Reasoning and evaluation systems enable agents to think through problems and assess solutions using prompting patterns such as zero-shot, one-shot, and few-shot.
  • 规划/反馈组件利用单路径或多路径推理,整合环境和人类反馈,组织任务以实现目标。
  • Planning/feedback components organize tasks to achieve goals using single-path or multipath reasoning and integrating environmental and human feedback.
  • 人工智能代理的兴起引入了一种新的软件开发范式,从传统的人工智能界面转向基于自然语言的人工智能界面。
  • The rise of AI agents has introduced a new software development paradigm, shifting from traditional to natural language–based AI interfaces.
  • 了解这些工具的发展和交互有助于开发代理系统,无论是单个代理系统、多个代理系统还是自主代理系统。
  • Understanding the progression and interaction of these tools helps develop agent systems, whether single, multiple, or autonomous.

2. 充分利用大型语言模型

2 Harnessing the power of large language models

本章内容

This chapter covers

  • 了解法学硕士的基础知识
  • Understanding the basics of LLMs
  • 连接和使用 OpenAI API
  • Connecting to and consuming the OpenAI API
  • 使用 LM Studio 探索和使用开源 LLM
  • Exploring and using open source LLMs with LM Studio
  • 以快速工程促进LLM
  • Prompting LLMs with prompt engineering
  • 选择最适合您特定需求的法学硕士课程
  • Choosing the optimal LLM for your specific needs

大型语言模型(LLM)这一术语如今,LLM 已成为人工智能的一种普遍描述形式。这些 LLM 是使用生成式预训练 Transformer (GPT) 模型开发的。虽然其他架构也支持 LLM,但 GPT 架构目前是最成功的。

The term large language models (LLMs) has now become a ubiquitous descriptor of a form of AI. These LLMs have been developed using generative pretrained transformers (GPTs). While other architectures also power LLMs, the GPT form is currently the most successful.

LLM 和 GPT 都是生成模型,这意味着它们的训练目的是生成内容,而不是预测或分类内容。为了进一步说明这一点,请参考图 2.1,该图展示了生成模型与预测/分类模型之间的区别。生成模型根据输入生成内容,而预测模型和分类模型则对输入内容进行分类。

LLMs and GPTs are generative models, which means they are trained to generate rather than predict or classify content. To illustrate this further, consider figure 2.1, which shows the difference between generative and predictive/classification models. Generative models create something from the input, whereas predictive and classifying models classify it.

数字
图 2.1生成模型与预测模型的区别

我们可以进一步根据其组成部分来定义LLM,如图2.2所示。在该图中,数据代表用于训练模型的内容,而架构是模型本身的属性,例如参数数量或模型大小。模型还会针对所需的用例进行专门训练,例如聊天、自动补全或指令。最后,微调是添加到模型中的一项功能,它可以优化输入数据和模型训练,使其更好地匹配特定的用例或领域。

We can further define an LLM by its constituent parts, as shown in figure 2.2. In this diagram, data represents the content used to train the model, and architecture is an attribute of the model itself, such as the number of parameters or size of the model. Models are further trained specifically to the desired use case, including chat, completions, or instruction. Finally, fine-tuning is a feature added to models that refines the input data and model training to better match a particular use case or domain.

数字
图 2.2法学硕士的主要要素

GPT 的 Transformer 架构是 LLM 的一种特定架构,它允许模型扩展到数十亿参数的规模。这需要使用 TB 级的文档来训练这些大型模型,以构建其基础。之后,这些模型将使用各种方法进行后续训练,以适应模型的预期应用场景。

The transformer architecture of GPTs, which is a specific architecture of LLMs, allows the models to be scaled to billions of parameters in size. This requires these large models to be trained on terabytes of documents to build a foundation. From there, these models will be successively trained using various methods for the desired use case of the model.

例如,ChatGPT模型首先在公共互联网上进行有效训练,然后使用多种训练策略进行微调。最终的微调训练采用了一种称为“带人工反馈的强化学习”(RLHF)的高级方法。由此产生了一个名为“聊天补全”的模型用例。

ChatGPT, for example, is trained effectively on the public internet and then fine-tuned using several training strategies. The final fine-tuning training is completed using an advanced form called reinforcement learning with human feedback (RLHF). This produces a model use case called chat completions.

聊天补全逻辑学习模型(LLM)旨在通过迭代和改进来提升性能——换句话说,就是通过聊天来提升性能。这些模型在任务完成、推理和规划方面也经过基准测试,表现优异,因此非常适合构建智能体和助手。补全模型仅用于根据输入文本生成内容,因此无法从迭代中获益。

Chat completions LLMs are designed to improve through iteration and refinement—in other words, chatting. These models have also been benchmarked to be the best in task completion, reasoning, and planning, which makes them ideal for building agents and assistants. Completion models are trained/designed only to provide generated content on input text, so they don’t benefit from iteration.

本书中,我们将重点介绍构建强大智能体的 LLM(逻辑逻辑模型)类别,即聊天补全模型。当然,这并不意味着您不能尝试其他模型形式。但是,您可能需要对提供的代码示例进行大幅修改才能支持其他模型形式。

For our journey to build powerful agents in this book, we focus on the class of LLMs called chat completions models. That, of course, doesn’t preclude you from trying other model forms for your agents. However, you may have to significantly alter the code samples provided to support other model forms.

本章稍后将详细介绍 LLM 和 GPT,届时我们将探讨如何在本地运行开源 LLM。下一节,我们将介绍如何使用 OpenAI 正在发展中的标准连接到 LLM。

We’ll uncover more details about LLMs and GPTs later in this chapter when we look at running an open source LLM locally. In the next section, we look at how to connect to an LLM using a growing standard from OpenAI.

2.1掌握 OpenAI API

2.1 Mastering the OpenAI API

许多人工智能代理和助手项目都使用 OpenAI API SDK 连接到 LLM(逻辑逻辑模型)。虽然这并非标准做法,但描述连接的基本概念现在遵循 OpenAI 的模式。因此,我们必须理解使用 OpenAI SDK 连接 LLM 的核心概念。

Numerous AI agents and assistant projects use the OpenAI API SDK to connect to an LLM. While not standard, the basic concepts describing a connection now follow the OpenAI pattern. Therefore, we must understand the core concepts of an LLM connection using the OpenAI SDK.

本章将探讨如何使用 OpenAI Python SDK/软件包连接到 LLM 模型。我们将讨论如何连接到 GPT-4 模型、模型响应、词元计数以及如何定义一致性消息。从以下小节开始,我们将研究如何使用 OpenAI。

This chapter will look at connecting to an LLM model using the OpenAI Python SDK/package. We’ll discuss connecting to a GPT-4 model, the model response, counting tokens, and how to define consistent messages. Starting in the following subsection, we’ll examine how to use OpenAI.

2.1.1连接到聊天完成模型

2.1.1 Connecting to the chat completions model

要完成本节及后续章节的练习,您必须搭建 Python 开发环境并获取 LLM 的访问权限。附录 A 将指导您如何设置 OpenAI 帐户并访问 GPT-4 或其他模型。附录 B 演示了如何使用 Visual Studio Code (VS Code) 设置 Python 开发环境,包括安装所需的扩展。如果您想跟随练习场景进行操作,请阅读这些章节。

To complete the exercises in this section and subsequent ones, you must set up a Python developer environment and get access to an LLM. Appendix A walks you through setting up an OpenAI account and accessing GPT-4 or other models. Appendix B demonstrates setting up a Python development environment with Visual Studio Code (VS Code), including installing needed extensions. Review these sections if you want to follow along with the scenarios.

首先在 VS Code 中打开源代码chapter_2文件夹,并创建一个新的 Python 虚拟环境。如果需要帮助,请参阅附录 B。

Start by opening the source code chapter_2 folder in VS Code and creating a new Python virtual environment. Again, refer to appendix B if you need assistance.

然后,使用以下命令安装 OpenAI 和 Python 的 .环境包。这将把所需的软件包安装到虚拟环境中。

Then, install the OpenAI and Python dot environment packages using the command in the following listing. This will install the required packages into the virtual environment.

清单 2.1 pip安装
pip install openai python-dotenv
pip install openai python-dotenv

接下来,在 VS Code 中打开该connecting.py文件,并检查清单 2.2 中所示的代码。务必将模型名称设置为合适的名称,例如 gpt-4。在撰写本文时,该名称gpt-4-1106-preview用于表示 GPT-4 Turbo。

Next, open the connecting.py file in VS Code, and inspect the code shown in listing 2.2. Be sure to set the model’s name to an appropriate name—for example, gpt-4. At the time of writing, the gpt-4-1106-preview was used to represent GPT-4 Turbo.

清单 2.2 connecting.py
导入操作系统
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()                          #1
api_key = os.getenv('OPENAI_API_KEY')
如果没有 api_key:                            #2
    raise ValueError("未找到 API 密钥。请检查您的 .env 文件。")
client = OpenAI(api_key=api_key)                       #3

def ask_chatgpt(user_message):
    response = client.chat.completions.create(     #4
        model="gpt-4-1106-preview",
        messages=[{"role": "system",
 "content": "你是一位乐于助人的助手。"
        {"角色": "用户", "内容": 用户消息}],
        温度=0.7,
        )
    返回 response.choices[0].message.content    #5

用户:“法国的首都是哪里?”
response = ask_chatgpt(user)               #6
print(响应)
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()                           #1
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:                             #2
    raise ValueError("No API key found. Please check your .env file.")
client = OpenAI(api_key=api_key)                        #3

def ask_chatgpt(user_message):
    response = client.chat.completions.create(      #4
        model="gpt-4-1106-preview",
        messages=[{"role": "system",
 "content": "You are a helpful assistant."},
        {"role": "user", "content": user_message}],
        temperature=0.7,
        )
    return response.choices[0].message.content     #5

user = "What is the capital of France?"
response = ask_chatgpt(user)                #6
print(response)
#1 加载存储在 .env 文件中的密钥

#2 检查密钥是否已设置

#3 使用密钥创建客户端

#4 使用 create 函数生成响应

#5 仅返回响应内容

#6 执行请求并返回响应

这里涉及的内容很多,所以我们分部分来讲解,从开头加载环境变量开始。chapter_2文件夹中还有一个名为 `<filename>` 的文件.env,其中保存着环境变量。这些变量会通过调用 `<function>` 函数自动设置load_dotenv

A lot is happening here, so let’s break it down by section, starting with the beginning and loading the environment variables. In the chapter_2 folder is another file called .env, which holds environment variables. These variables are set automatically by calling the load_dotenv function.

您必须在.env文件中设置您的 OpenAI API 密钥,如下表所示。同样,请参阅附录 A 以了解如何获取密钥和查找模型名称。

You must set your OpenAI API key in the .env file, as shown in the next listing. Again, refer to appendix A to find out how to get a key and find a model name.

清单 2.3 .env
OPENAI_API_KEY='your-openai-api-key'
OPENAI_API_KEY='your-openai-api-key'

设置密钥后,您可以按 F5 键或从 VS Code 菜单中选择“运行”>“开始调试”来调试文件。这将运行代码,您应该会看到类似“法国的首都是巴黎”的信息。

After setting the key, you can debug the file by pressing the F5 key or selecting Run > Start Debugging from the VS Code menu. This will run the code, and you should see something like “The capital of France is Paris.”

请记住,生成模型的响应取决于概率。在这种情况下,模型很可能会给出正确且一致的答案。

Remember that the response from a generative model depends on the probability. The model will probably give us a correct and consistent answer in this case.

您可以通过调整请求温度来控制这些概率。如果您希望模型更加稳定,请将温度调低至 0;如果您希望模型产生更多变化,请调高温度。我们将在下一节中进一步探讨如何设置温度。

You can play with these probabilities by adjusting the temperature of the request. If you want a model to be more consistent, turn the temperature down to 0, but if you want the model to produce more variation, turn the temperature up. We’ll explore setting the temperature further in the next section.

2.1.2理解请求和响应

2.1.2 Understanding the request and response

深入研究聊天完成请求和响应功能会很有帮助。我们首先关注请求,如下所示。请求包含了预期的模型、消息和温度。

Digging into the chat completions request and response features can be helpful. We’ll focus on the request first, as shown next. The request encapsulates the intended model, the messages, and the temperature.

列表 2.4聊天完成请求
response = client.chat.completions.create(
    model="gpt-4-1106-preview",                #1
    messages=[{"role": "system",
"content": "你是一位乐于助人的助手。"                    #2
              {"角色": "用户", "内容": 用户消息}],     #3
    温度=0.7,    #4
    )
response = client.chat.completions.create(
    model="gpt-4-1106-preview",                 #1
    messages=[{"role": "system", 
"content": "You are a helpful assistant."},                     #2
              {"role": "user", "content": user_message}],      #3
    temperature=0.7,     #4
    )
#1 用于响应请求的模型或部署方式

#2 系统角色消息

#3 用户角色消息

#4 请求的温度或变化性

在请求中,该messages代码块描述了请求中使用的一组消息和角色。聊天完成模型中的消息可以定义为三种角色:

Within the request, the messages block describes a set of messages and roles used in a request. Messages for a chat completions model can be defined in three roles:

  • 系统角色 ——一条描述请求规则和指南的消息。它通常用于描述LLM在发起请求中的角色。
  • System role —A message that describes the request’s rules and guidelines. It can often be used to describe the role of the LLM in making the request.
  • 用户角色 ——代表并包含来自用户的消息。
  • User role —Represents and contains the message from the user.
  • 辅助角色 ——可用于捕获 LLM 先前响应的消息历史记录。它还可以在可能不存在消息历史记录的情况下注入消息历史记录。
  • Assistant role —Can be used to capture the message history of previous responses from the LLM. It can also inject a message history when perhaps none existed.

单个请求中发送的消息可以封装整个对话,如下面的 JSON 示例所示。

The message sent in a single request can encapsulate an entire conversation, as shown in the JSON in the following listing.

列表 2.5包含历史记录的消息
[
    {
        "角色": "系统",
        "内容": "你是一位乐于助人的助手。"
    },
    {
        "角色": "用户",
        内容:法国的首都是哪里?
    },
    {
        “角色”:“助理”,
        内容:法国的首都是巴黎。
    },
    {
        "角色": "用户",
        内容:关于巴黎,有什么有趣的事实?
    }
],
[
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "What is the capital of France?"
    },
    {
        "role": "assistant",
        "content": "The capital of France is Paris."
    },
    {
        "role": "user",
        "content": "What is an interesting fact of Paris."
    }
],

您可以通过在 VS Code 中打开并按 F5 进行调试来查看其应用方式message_history.py。文件运行后,请务必检查输出结果。然后,尝试多次运行示例,看看结果有何变化。

You can see how this can be applied by opening message_history.py in VS Code and debugging it by pressing F5. After the file runs, be sure to check the output. Then, try to run the sample a few more times to see how the results change.

由于温度较高,每次运行的结果都会有所不同.7。请将温度降低至.0,并重复运行message_history.py几次。保持温度在,0每次都会得到相同或相似的结果。

The results will change from each run to the next due to the high temperature of .7. Go ahead and reduce the temperature to .0, and run the message_history.py sample a few more times. Keeping the temperature at 0 will show the same or similar results each time.

设置请求的温度通常取决于您的具体使用场景。有时,您可能希望限制响应的随机性。将温度降低到某个值0可以获得一致的结果。同样,1.0将温度设置为某个值会使响应具有最大的可变性。

Setting a request’s temperature will often depend on your particular use case. Sometimes, you may want to limit the responses’ stochastic nature (randomness). Reducing the temperature to 0 will give consistent results. Likewise, a value of 1.0 will give the most variability in the responses.

接下来,我们还想知道每个请求返回了哪些信息。下面的清单显示了响应的输出格式。您可以通过message_history.py在 VS Code 中运行该文件来查看此输出。

Next, we also want to know what information is being returned for each request. The next listing shows the output format for the response. You can see this output by running the message_history.py file in VS Code.

列表 2.6聊天完成响应
{
    “id”: “chatcmpl-8WWL23up3IRfK1nrDFQ3EHQfhx0U6”
    “选择”:[                                     #1
        {
            "finish_reason": "停止",
            "index": 0,
            “信息”: {
                "内容": "…省略",
                “角色”:“助理”,      #2
                "function_call": null,
                "tool_calls": null
            },
            "logprobs": null
        }
    ],
    “创建”:1702761496,
    “模型”: “gpt-4-1106-preview”    #3
    "object": "chat.completion",
    "system_fingerprint": "fp_3905aa4f79",
    “用法”: {
        "completion_tokens": 78,    #4
        "prompt_tokens": 48,#4
        "total_tokens": 126 #4
    }
}
{
    "id": "chatcmpl-8WWL23up3IRfK1nrDFQ3EHQfhx0U6",
    "choices": [                                      #1
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "… omitted",
                "role": "assistant",       #2
                "function_call": null,
                "tool_calls": null
            },
            "logprobs": null
        }
    ],
    "created": 1702761496,
    "model": "gpt-4-1106-preview",     #3
    "object": "chat.completion",
    "system_fingerprint": "fp_3905aa4f79",
    "usage": {
        "completion_tokens": 78,     #4
        "prompt_tokens": 48,         #4
        "total_tokens": 126          #4
    }
}
#1 一个模型可能返回多个响应。

#2 以助手角色返回的响应。

#3 指示所使用的模型。

#4 统计使用的输入(提示)和输出(完成)标记的数量。

跟踪输入令牌(提示中使用的令牌)和输出令牌(通过补全返回的令牌)的数量很有帮助。有时,尽量减少令牌数量至关重要。令牌越少,LLM 交互的成本通常越低,响应速度越快,并且能够产生更好、更一致的结果。

It can be helpful to track the number of input tokens (those used in prompts) and the output tokens (the number returned through completions). Sometimes, minimizing and reducing the number of tokens can be essential. Having fewer tokens typically means LLM interactions will be cheaper, respond faster, and produce better and more consistent results.

以上内容涵盖了连接到 LLM 并返回响应的基础知识。本书后续章节将回顾并详细讲解如何与 LLM 进行交互。在此之前,下一节我们将探讨如何加载和使用开源 LLM。

That covers the basics of connecting to an LLM and returning responses. Throughout this book, we’ll review and expand on how to interact with LLMs. Until then, we’ll explore in the next section how to load and use open source LLMs.

2.2使用 LM Studio 探索开源 LLM

2.2 Exploring open source LLMs with LM Studio

商业化的机器学习模型(例如 OpenAI 的 GPT-4)是学习如何使用现代人工智能和构建智能体的绝佳起点。然而,商业智能体作为一种外部资源,需要付费,会降低数据隐私和安全,并引入依赖关系。其他外部因素也会使这些问题更加复杂。

Commercial LLMs, such as GPT-4 from OpenAI, are an excellent place to start to learn how to use modern AI and build agents. However, commercial agents are an external resource that comes at a cost, reduces data privacy and security, and introduces dependencies. Other external influences can further complicate these factors.

毫不奇怪,构建可比拟的开源LLM的竞争日益激烈。因此,现在已经出现了许多足以应对各种任务和代理系统的开源LLM。短短一年内,工具方面就取得了如此巨大的进步,以至于本地部署LLM变得非常容易,我们将在下一节中看到这一点。

It’s unsurprising that the race to build comparable open source LLMs is growing more competitive every day. As a result, there are now open source LLMs that may be adequate for numerous tasks and agent systems. There have even been so many advances in tooling in just a year that hosting LLMs locally is now very easy, as we’ll see in the next section.

2.2.1安装和运行 LM Studio

2.2.1 Installing and running LM Studio

LM Studio 是一款免费下载的软件,支持在 Windows、Mac 和 Linux 系统上下载并本地托管 LLM 和其他模型。该软件易于使用,并提供多种实用功能,可帮助您快速上手。以下是下载和安装 LM Studio 的简要步骤:

LM Studio is a free download that supports downloading and hosting LLMs and other models locally for Windows, Mac, and Linux. The software is easy to use and offers several helpful features to get you started quickly. Here is a quick summary of steps to download and set up LM Studio:

  1. 从https://lmstudio.ai/下载 LM Studio 。
  2. Download LM Studio from https://lmstudio.ai/.
  3. 下载完成后,请根据您的操作系统安装软件。请注意,某些版本的 LM Studio 可能处于测试阶段,需要安装其他工具或库。
  4. After downloading, install the software per your operating system. Be aware that some versions of LM Studio may be in beta and require installation of additional tools or libraries.
  5. 启动软件。
  6. Launch the software.

图 2.3 显示了正在运行的 LM Studio 窗口。您可以在此查看当前热门车型列表、搜索其他车型,甚至下载车型。首页内容有助于您了解热门车型的详细信息和规格。

Figure 2.3 shows the LM Studio window running. From there, you can review the current list of hot models, search for others, and even download. The home page content can be handy for understanding the details and specifications of the top models.

数字
图 2.3 LM Studio 软件主页

LM Studio 的一个吸引人的特点是它能够分析您的硬件并将其与特定模型的要求相匹配。该软件会告诉您运行特定模型的性能如何。这可以极大地节省时间,指导您选择合适的模型进行实验。

An appealing feature of LM Studio is its ability to analyze your hardware and align it with the requirements of a given model. The software will let you know how well you can run a given model. This can be a great time saver in guiding what models you experiment with.

输入一些文本来搜索模型,然后点击“搜索”。您将被带到搜索页面界面,如图 2.4 所示。在此页面上,您可以查看所有模型变体和其他规格,例如上下文标记大小。点击“兼容性猜测”按钮后,软件甚至会告诉您该模型是否能在您的系统上运行。

Enter some text to search for a model, and click Go. You’ll be taken to the search page interface, as shown in figure 2.4. From this page, you can see all the model variations and other specifications, such as context token size. After you click the Compatibility Guess button, the software will even tell you if the model will run on your system.

数字
图 2.4 LM Studio 搜索页面

点击即可下载任何适用于您系统运行的模型。您可能需要使用专为聊天自动补全设计的模型,但如果您的系统配置有限,请使用现有模型。此外,如果您不确定使用哪个模型,不妨下载试用。LM Studio 是探索和体验各种模型的绝佳工具。

Click to download any model that will run on your system. You may want to stick with models designed for chat completions, but if your system is limited, work with what you have. In addition, if you’re unsure of which model to use, go ahead and download to try them. LM Studio is a great way to explore and experiment with many models.

模型下载完成后,您可以在聊天页面加载并运行该模型,也可以在服务器页面将其作为服务器运行。图 2.5 展示了如何在聊天页面加载和运行模型。图中还显示了启用和使用 GPU 的选项(如果您有 GPU)。

After the model is downloaded, you can then load and run the model on the chat page or as a server on the server page. Figure 2.5 shows loading and running a model on the chat page. It also shows the option for enabling and using a GPU if you have one.

数字
图 2.5加载了本地运行的 LLM 的 LM Studio 聊天页面

要加载并运行模型,请打开页面顶部中间的下拉菜单,然后选择已下载的模型。页面将显示进度条,表示模型正在加载;加载完成后,即可开始在用户界面中输入内容。

To load and run a model, open the drop-down menu at the top middle of the page, and select a downloaded model. A progress bar will appear showing the model loading, and when it’s ready, you can start typing into the UI.

该软件甚至允许您在检测到GPU的情况下,使用部分或全部GPU进行模型推理。GPU通常可以在一定程度上加快模型的响应速度。您可以通过查看页面底部的性能状态(如图2.5所示)来了解添加GPU如何影响模型的性能。

The software even allows you to use some or all of your GPU, if detected, for the model inference. A GPU will generally speed up the model response times in some capacities. You can see how adding a GPU can affect the model’s performance by looking at the performance status at the bottom of the page, as shown in figure 2.5.

与模型进行对话,并使用或尝试各种提示,可以帮助您确定模型在特定用例中的适用性。更系统的方法是使用提示流工具来评估提示和学习生命周期模型 (LLM)。我们将在第 9 章中介绍如何使用提示流工具。

Chatting with a model and using or playing with various prompts can help you determine how well a model will work for your given use case. A more systematic approach is using the prompt flow tool for evaluating prompts and LLMs. We’ll describe how to use prompt flow in chapter 9.

LM Studio 还允许在服务器上运行模型,并使用 OpenAI 软件包使其可访问。我们将在下一节中了解如何使用服务器功能并部署模型。

LM Studio also allows a model to be run on a server and made accessible using the OpenAI package. We’ll see how to use the server feature and serve a model in the next section.

2.2.2与 LM Studio 合作,在当地提供 LLM 服务

2.2.2 Serving an LLM locally with LM Studio

使用 LM Studio 在本地运行 LLM 服务器非常简单。只需打开服务器页面,加载模型,然后单击“启动服务器”按钮,如图 2.6 所示。之后,您可以复制并粘贴任何示例代码来连接到您的模型。

Running an LLM locally as a server is easy with LM Studio. Just open the server page, load a model, and then click the Start Server button, as shown in figure 2.6. From there, you can copy and paste any of the examples to connect with your model.

数字
图 2.6 LM Studio 服务器页面和运行 LLM 的服务器

您可以通过在 VS Code 中打开来查看 Python 代码示例chapter_2/lmstudio_ server.py。代码也显示在清单 2.7 中。然后,在 VS Code 调试器中运行代码(按 F5)。

You can review an example of the Python code by opening chapter_2/lmstudio_ server.py in VS Code. The code is also shown here in listing 2.7. Then, run the code in the VS Code debugger (press F5).

清单 2.7 lmstudio_server.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="不需要")

completion = client.chat.completions.create(
  model="local-model",                          #1
  messages=[
    {"角色": "系统", "内容": "请用韵文回答。"},
    {"role": "用户", "content": "自我介绍。"}      #2
  ],
  温度=0.7,
)

print(completion.choices[0].message)     #3
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

completion = client.chat.completions.create(
  model="local-model",                           #1
  messages=[
    {"role": "system", "content": "Always answer in rhymes."},
    {"role": "user", "content": "Introduce yourself."}       #2
  ],
  temperature=0.7,
)

print(completion.choices[0].message)      #3
#1 目前未使用;可以是任何内容。

#2 您可以随意更改消息。

#3 默认代码会输出完整消息。

如果您在连接服务器时遇到问题或遇到任何其他问题,请确保您的服务器模型设置配置与模型类型匹配。例如,在前面所示的图 2.6 中,加载的模型与服务器设置不符。更正后的设置如图 2.7 所示。

If you encounter problems connecting to the server or experience any other problems, be sure your configuration for the Server Model Settings matches the model type. For example, in figure 2.6, shown earlier, the loaded model differs from the server settings. The corrected settings are shown in figure 2.7.

数字
图 2.7为已加载的模型选择正确的服务器模型设置

现在,您可以使用本地托管的 LLM 或商业模型来构建、测试甚至运行您的代理。以下部分将探讨如何更有效地使用提示工程来构建提示。

Now, you can use a locally hosted LLM or a commercial model to build, test, and potentially even run your agents. The following section will examine how to build prompts using prompt engineering more effectively.

2.3利用快速工程促进LLM

2.3 Prompting LLMs with prompt engineering

为LLM(生命周期管理)定义的提示是指用于请求更佳响应输出的消息内容。提示工程是一个新兴领域,旨在构建一套用于创建提示的方法论。遗憾的是,提示构建并非一门成熟的科学,目前存在着越来越多、越来越多样化的提示工程方法。

A prompt defined for LLMs is the message content used in the request for better response output. Prompt engineering is a new and emerging field that attempts to structure a methodology for building prompts. Unfortunately, prompt building isn’t a well-established science, and there is a growing and diverse set of methods defined as prompt engineering.

幸运的是,像 OpenAI 这样的组织已经开始记录一套通用的策略,如图 2.8 所示。这些策略涵盖了各种战术,其中一些需要额外的基础设施和考量因素。因此,与更高级概念相关的快速工程策略将在相应的章节中进行介绍。

Fortunately, organizations such as OpenAI have begun documenting a universal set of strategies, as shown in figure 2.8. These strategies cover various tactics, some requiring additional infrastructure and considerations. As such, the prompt engineering strategies relating to more advanced concepts will be covered in the indicated chapters.

数字
图 2.8本书中回顾的 OpenAI 提示工程策略(按章节位置排列)

图 2.8 中的每项策略都可以展开为具体的战术,从而进一步完善提示设计的具体方法。本章将探讨基本的“编写清晰指令”策略。图 2.9 更详细地展示了该策略的各种战术,并为每个战术提供了示例。在接下来的章节中,我们将通过代码演示来运行这些示例。

Each strategy in figure 2.8 unfolds into tactics that can further refine the specific method of prompt engineering. This chapter will examine the fundamental Write Clear Instructions strategy. Figure 2.9 shows the tactics for this strategy in more detail, along with examples for each tactic. We’ll look at running these examples using a code demo in the following sections.

数字
图 2.9编写清晰说明策略的具体方法

“清晰指令”策略的关键在于谨慎且具体地提出要求。要求LLM执行某项任务与要求普通人完成同一任务并无本质区别。通常来说,请求中提供的任务相关信息和背景信息越详细,得到的回复就越好。

The Write Clear Instructions strategy is about being careful and specific about what you ask for. Asking an LLM to perform a task is no different from asking a person to complete the same task. Generally, the more information and context relevant to a task you can specify in a request, the better the response.

该策略已分解为您可以应用于提示的具体技巧。为了帮助您了解如何使用这些技巧,源代码文件夹prompt_engineering.py中提供了一个包含各种提示示例的代码演示() 。chapter 2

This strategy has been broken down into specific tactics you can apply to prompts. To understand how to use those, a code demo (prompt_engineering.py) with various prompt examples is in the chapter 2 source code folder.

prompt_engineering.py如清单 2.8 所示,在 VS Code 中打开文件。这段代码首先加载prompts文件夹中的所有 JSON Lines 文件。然后,它将文件列表显示为选项,并允许用户选择一个提示。选择选项后,提示将被提交到 LLM,并打印响应。

Open the prompt_engineering.py file in VS Code, as shown in listing 2.8. This code starts by loading all the JSON Lines files in the prompts folder. Then, it displays the list of files as choices and allows the user to select a prompt option. After selecting the option, the prompts are submitted to an LLM, and the response is printed.

列表 2.8 prompt_engineering.py (main())
def main():
    目录 = "提示"
    text_files = list_text_files_in_directory(directory)   #1

    如果没有 text_files:
        print("目录中未找到文本文件。")
        返回

    def print_available():                                    #2
        print("可用的提示策略:")
        for i, filename in enumerate(text_files, start=1):
            print(f"{i}.{filename}")

    当 True 时:
        尝试:
            print_available() #2              
            choice = int(input("输入… 0 退出:"))          #3
            如果 choice == 0:
                休息
            elif 1 <= choice <= len(text_files):
                selected_file = text_files[choice - 1]
                file_path = os.path.join(directory,
      选定文件)
                提示 =
 load_and_parse_json_file(file_path)                         #4
                print(f"正在运行 {selected_file} 的提示")
                for i, prompt in enumerate(prompts):
                    print(f"提示 {i+1} --------------------")
                    print(提示)
                    print(f"回复 ---------------------------")
                    print(prompt_llm(prompt))                      #5
            别的:
                print("选择无效。请输入有效数字。")
        异常 ValueError:
            print("输入无效。请输入一个数字。")
def main():
    directory = "prompts"
    text_files = list_text_files_in_directory(directory)    #1

    if not text_files:
        print("No text files found in the directory.")
        return

    def print_available():                                     #2
        print("Available prompt tactics:")
        for i, filename in enumerate(text_files, start=1):
            print(f"{i}. {filename}")

    while True:
        try:
            print_available()                   #2              
            choice = int(input("Enter … 0 to exit): "))           #3
            if choice == 0:
                break
            elif 1 <= choice <= len(text_files):
                selected_file = text_files[choice - 1]
                file_path = os.path.join(directory,
      selected_file)
                prompts = 
 load_and_parse_json_file(file_path)                          #4
                print(f"Running prompts for {selected_file}")
                for i, prompt in enumerate(prompts):
                    print(f"PROMPT {i+1} --------------------")
                    print(prompt)
                    print(f"REPLY ---------------------------")
                    print(prompt_llm(prompt))                       #5
            else:
                print("Invalid choice. Please enter a valid number.")
        except ValueError:
            print("Invalid input. Please enter a number.")
#1 收集指定文件夹中的所有文件

#2 将文件列表打印出来作为选项

#3 输入用户的选择

#4 加载提示并将其解析为消息

#5 将提示提交给 OpenAI LLM

示例中被注释掉的部分演示了如何连接到本地 LLM。这将使您能够探索适用于本地运行的开源 LLM 的相同提示工程策略。默认情况下,此示例使用我们在 2.1.1 节中配置的 OpenAI 模型。如果您之前没有完成配置,请在运行此示例之前返回并完成配置。

A commented-out section from the listing demonstrates how to connect to a local LLM. This will allow you to explore the same prompt engineering tactics applied to open source LLMs running locally. By default, this example uses the OpenAI model we configured previously in section 2.1.1. If you didn’t complete that earlier, please go back and do it before running this one.

图 2.10 显示了运行提示工程策略测试器(VS Code 中的文件)的输出结果prompt_engineering.py。运行测试器时,您可以输入要测试的策略值并观察其运行情况。

Figure 2.10 shows the output of running the prompt engineering tactics tester, the prompt_engineering.py file in VS Code. When you run the tester, you can enter a value for the tactic you want to test and watch it run.

数字
图 2.10提示工程战术测试器的输出

在接下来的章节中,我们将更详细地探讨每一种提示策略,并分析各种示例。

In the following sections, we’ll explore each prompt tactic in more detail. We’ll also examine the various examples.

2.3.1创建详细查询

2.3.1 Creating detailed queries

这种策略的基本前提是尽可能提供详细信息,但同时也要注意避免提供无关细节。以下列表展示了 JSON Lines 文件示例,供您探索这种策略。

The basic premise of this tactic is to provide as much detail as possible but also to be careful not to give irrelevant details. The following listing shows the JSON Lines file examples for exploring this tactic.

列表 2.9 detailed_queries.jsonl
[                       #1
    {
        "角色": "系统",
        "内容": "你是一位乐于助人的助手。"
    },
    {
        "角色": "用户",
        内容:“什么是代理人?”     #2
    }
]
[
    {
        "角色": "系统",
        "内容": "你是一位乐于助人的助手。"
    },
    {
        "角色": "用户",
        “内容”: ”””
什么是GPT代理?
请举出三个GPT代理的例子。
"""                                       #3
    }
]
[                        #1
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "What is an agent?"      #2
    }
]
[
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": """
What is a GPT Agent? 
Please give me 3 examples of a GPT agent
"""                                        #3
    }
]
#1 第一个例子没有使用详细的查询语句。

#2 首先向LLM提出一个非常笼统的问题。

#3 提出一个更具体的问题,并要求举例说明。

这个例子展示了使用详细查询和不使用详细查询的区别。它更进一步,要求提供示例。记住,你的提问越相关、越有背景信息,最终的回答就越好。要求提供示例是加强问题与预期结果之间联系的另一种方式。

This example demonstrates the difference between using detailed queries and not. It also goes a step further by asking for examples. Remember, the more relevance and context you can provide in your prompt, the better the overall response. Asking for examples is another way of enforcing the relationship between the question and the expected output.

2.3.2采用用户画像

2.3.2 Adopting personas

采用用户画像可以让语言学习经理(LLM)定义一个总体背景或一套规则。LLM 随后可以利用这些背景和/或规则来构建所有后续的输出响应。这是一个非常有效的策略,我们将在本书中大量运用。

Adopting personas grants the ability to define an overarching context or set of rules to the LLM. The LLM can then use that context and/or rules to frame all later output responses. This is a compelling tactic and one that we’ll make heavy use of throughout this book.

清单 2.10 展示了使用两个角色来回答同一个问题的示例。这是一种有趣的技巧,可以探索各种新颖的应用场景,从获取人口统计反馈到专注于特定任务,甚至是模拟测试。

Listing 2.10 shows an example of employing two personas to answer the same question. This can be an enjoyable technique for exploring a wide range of novel applications, from getting demographic feedback to specializing in a specific task or even rubber ducking.

清单 2.10 adopting_personas.jsonl
[
    {
        "角色": "系统",
        “内容”: ”””
你是一名20岁的女大学生。
在计算机科学领域。请回答所有问题。
初级程序员。
"""                        #1
    },
    {
        "角色": "用户",
        “内容”:“什么科目最好?”
    }
]
[
    {
        "角色": "系统",
        “内容”: ”””
您是一名38岁的男性注册护士。
请以专业医务人员的身份回答所有问题。
"""                                            #2
    },
    {
        "角色": "用户",
        “内容”:“什么科目最好?”
    }
]
[
    {
        "role": "system",
        "content": """
You are a 20 year old female who attends college 
in computer science. Answer all your replies as 
a junior programmer.
"""                         #1
    },
    {
        "role": "user",
        "content": "What is the best subject to study."
    }
]
[
    {
        "role": "system",
        "content": """
You are a 38 year old male registered nurse. 
Answer all replies as a medical professional.
"""                                             #2
    },
    {
        "role": "user",
        "content": "What is the best subject to study."
    }
]
#1 第一人称

#2 第二人称

代理档案的核心要素是角色。我们将采用不同的角色来帮助代理完成任务。在执行此策略时,请特别注意LLM输出响应的方式。

A core element of agent profiles is the persona. We’ll employ various personas to assist agents in completing their tasks. When you run this tactic, pay particular attention to the way the LLM outputs the response.

2.3.3使用分隔符

2.3.3 Using delimiters

分隔符是一种有效的方法,可以隔离信息,使 LLM 专注于消息的特定部分。这种策略通常与其他策略结合使用,但单独使用也效果很好。以下列出了两个示例,但描述分隔符的方法还有很多,例如 XML 标签和 Markdown。

Delimiters are a useful way of isolating and getting the LLM to focus on some part of a message. This tactic is often combined with other tactics but can work well independently. The following listing demonstrates two examples, but there are several other ways of describing delimiters, from XML tags to using markdown.

清单 2.11 using_delimiters.jsonl
[
    {
        "角色": "系统",
        “内容”: ”””
概括三引号内的内容
用一首俳句。
"""              #1
    },
    {
        "角色": "用户",
        内容:“金链子很酷,但银链子更好。”
    }
]
[
    {
        "角色": "系统",
        “内容”: ”””
您将收到两份声明
(用 XML 标签分隔)关于同一主题。
首先概括每项陈述的论点。
然后指出哪一个说法更好。
 并解释原因。
"""                       #2
    },
    {
        "角色": "用户",
        “内容”: ”””
金链子很酷。
<statement>银链更好</statement>
"""
    }
]
[
    {
        "role": "system",
        "content": """
Summarize the text delimited by triple quotes 
with a haiku.
"""               #1
    },
    {
        "role": "user",
        "content": "A gold chain is cool '''but a silver chain is better'''"
    }
]
[
    {
        "role": "system",
        "content": """
You will be provided with a pair of statements 
(delimited with XML tags) about the same topic. 
First summarize the arguments of each statement. 
Then indicate which of them makes a better statement
 and explain why.
"""                        #2
    },
    {
        "role": "user",
        "content": """
<statement>gold chains are cool</statement>
<statement>silver chains are better</statement>
"""
    }
]
分隔符的定义方式有两种:1. 分隔符由字符类型和重复次数决定;

2. 分隔符的定义方式有两种:1. 分隔符由 XML 标准决定。

使用此策略时,请注意LLM在输出响应时重点关注的文本部分。此策略有助于描述层级结构或其他关系模式中的信息。

When you run this tactic, pay attention to the parts of the text the LLM focuses on when it outputs the response. This tactic can be beneficial for describing information in a hierarchy or other relationship patterns.

2.3.4明确步骤

2.3.4 Specifying steps

指定步骤是另一种强大的策略,用途广泛,包括在智能体中,如清单 2.12 所示。在为复杂的多步骤任务开发提示或智能体配置文件时,它尤其有效。您可以指定步骤,将这些复杂的提示分解成 LLM 可以遵循的逐步流程。反过来,这些步骤可以引导 LLM 在更长的对话和多次迭代中完成多次交互。

Specifying steps is another powerful tactic that can have many uses, including in agents, as shown in listing 2.12. It’s especially powerful when developing prompts or agent profiles for complex multistep tasks. You can specify steps to break down these complex prompts into a step-by-step process that the LLM can follow. In turn, these steps can guide the LLM through multiple interactions over a more extended conversation and many iterations.

清单 2.12 specifying_steps.jsonl
[
    {
        "角色": "系统",
        “内容”: ”””
请按照以下步骤操作,以响应用户输入。
步骤 1 - 用户将以三个单引号括起来提供文本。
请用一句话概括这段文字,并在前面加上“概括:”的前缀。
第二步——将第一步的总结翻译成西班牙语。
带有“翻译:”前缀。
"""                                         #1
    },
    {
        "角色": "用户",
        "content": "我饿了,想点一份开胃菜。"
    }
]
[
    {
        "角色": "系统",
        “内容”: ”””
请按照以下步骤操作,以响应用户输入。
步骤 1 - 用户将提供文本。请回答任何问题。
文本以一句话的形式呈现,并在前面加上“答案:”。

第二步——将第一步的答案翻译成一个老爸笑话,
 前面加上“爸爸笑话:”的前缀。                     #2
    },
    {
        "角色": "用户",
        内容:巴黎最高的建筑是什么?
    }
]
[
    {
        "role": "system",
        "content": """
Use the following step-by-step instructions to respond to user inputs.
Step 1 - The user will provide you with text in triple single quotes. 
Summarize this text in one sentence with a prefix that says 'Summary: '.
Step 2 - Translate the summary from Step 1 into Spanish, 
with a prefix that says 'Translation: '.
"""                                          #1
    },
    {
        "role": "user",
        "content": "'''I am hungry and would like to order an appetizer.'''"
    }
]
[
    {
        "role": "system",
        "content": """
Use the following step-by-step instructions to respond to user inputs.
Step 1 - The user will provide you with text. Answer any questions in 
the text in one sentence with a prefix that says 'Answer: '.

Step 2 - Translate the Answer from Step 1 into a dad joke,
 with a prefix that says 'Dad Joke: '."""                      #2
    },
    {
        "role": "user",
        "content": "What is the tallest structure in Paris?"
    }
]
#1 注意使用分隔符的技巧。

#2 步骤可以是完全不同的操作。

2.3.5提供示例

2.3.5 Providing examples

提供示例是引导 LLM 达到预期输出的绝佳方法。有很多方法可以向 LLM 展示示例。系统消息/提示可以有效地强调一般输出。在下面的示例中,示例是 LLM 助手在收到“教我 Python”提示后添加的最后一个回复。

Providing examples is an excellent way to guide the desired output of an LLM. There are numerous ways to demonstrate examples to an LLM. The system message/prompt can be a helpful way to emphasize general output. In the following listing, the example is added as the last LLM assistant reply, given the prompt “Teach me about Python.”

清单 2.13 providing_examples.jsonl
[
    {
        "角色": "系统",
        “内容”: ”””
所有回复均需采用统一的格式,风格一致。
您之前回复的长度和风格。
例子:
  用户:
       请教我Python。
  助手:                                               #1
       Python 是一种于 1989 年开发的编程语言。
 作者:Guido van Rossum。

  后续回复:
       回复只有一句话,所以限制
 所有后续回复都将指向同一句话。
"""                                          #2
    },
    {
        "角色": "用户",
        内容:教我了解Java。
    }
]
[
    {
        "role": "system",
        "content": """
Answer all replies in a consistent style that follows the format, 
length and style of your previous responses.
Example:
  user:
       Teach me about Python.
  assistant:                                                #1
       Python is a programming language developed in 1989
 by Guido van Rossum.

  Future replies:
       The response was only a sentence so limit
 all future replies to a single sentence.
"""                                           #2
    },
    {
        "role": "user",
        "content": "Teach me about Java."
    }
]
#1 将示例输出作为“上一个”助手回复注入

#2 添加限制输出策略,以限制输出大小并与示例匹配

提供示例还可以用于要求从一系列复杂的任务中生成特定的输出格式。例如,要求语言学习大师 (LLM) 生成与示例输出相匹配的代码就是示例的绝佳应用。本书将贯穿使用这种策略,但还有其他方法可以指导输出。

Providing examples can also be used to request a particular output format from a complex series of tasks that derive the output. For example, asking an LLM to produce code that matches a sample output is an excellent use of examples. We’ll employ this tactic throughout the book, but other methods exist for guiding output.

2.3.6指定输出长度

2.3.6 Specifying output length

指定输出长度的策略不仅可以限制词元数量,还可以引导输出格式符合预期。清单 2.14 展示了两种不同的输出长度限制方法。第一种方法将输出限制在 10 个单词以内。这样做的好处是可以使回复更加简洁明了,更有针对性,这在某些情况下非常理想。第二种方法将输出限制为简洁的要点列表。这种方法有助于缩小输出范围,使答案更简短。更简洁的答案通常意味着输出内容更集中,包含的冗余信息更少。

The tactic of specifying output length can be helpful in not just limiting tokens but also in guiding the output to a desired format. Listing 2.14 shows an example of using two different techniques for this tactic. The first limits the output to fewer than 10 words. This can have the added benefit of making the response more concise and directed, which can be desirable for some use cases. The second example demonstrates limiting output to a concise set of bullet points. This method can help narrow down the output and keep answers short. More concise answers generally mean the output is more focused and contains less filler.

清单 2.14 specifying_output_length.jsonl
[
    {
        "角色": "系统",
        “内容”: ”””
请将所有回复总结成10个字以内。
"""                                              #1
    },
    {
        "角色": "用户",
        内容:请告诉我一个关于巴黎的有趣事实?
    }
]
[
    {
        "角色": "系统",
        “内容”: ”””
将所有回复总结成 3 个要点。
"""                                          #2
    },
    {
        "角色": "用户",
        内容:请告诉我一个关于巴黎的有趣事实?
    }
]
[
    {
        "role": "system",
        "content": """
Summarize all replies into 10 or fewer words.
"""                                               #1
    },
    {
        "role": "user",
        "content": "Please tell me an exciting fact about Paris?"
    }
]
[
    {
        "role": "system",
        "content": """
Summarize all replies into 3 bullet points.
"""                                           #2
    },
    {
        "role": "user",
        "content": "Please tell me an exciting fact about Paris?"
    }
]
#1 限制输出内容可以使答案更简洁。

#2 将答案限制为一组简短的要点。

在开发多智能体系统时,保持回答简洁可以带来额外的好处。任何与其他智能体进行交互的智能体系统都能从更简洁、更聚焦的回复中受益。这有助于保持逻辑逻辑模型(LLM)的专注性,并减少不必要的通信噪音。

Keeping answers brief can have additional benefits when developing multi-agent systems. Any agent system that converses with other agents can benefit from more concise and focused replies. It tends to keep the LLM more focused and reduces noisy communication.

务必仔细阅读本策略的所有提示技巧示例。如前所述,我们将在后续章节中介绍其他提示工程策略和技巧。本章最后,我们将探讨如何为您的用例选择最佳的 LLM(学习率模型)。

Be sure to run through all the examples of the prompt tactics for this strategy. As mentioned, we’ll cover other prompt engineering strategies and tactics in future chapters. We’ll finish this chapter by looking at how to pick the best LLM for your use case.

2.4选择最适合您特定需求的LLM

2.4 Choosing the optimal LLM for your specific needs

虽然成功构建人工智能代理并不需要深入了解逻辑逻辑模型(LLM),但能够评估其规范仍然很有帮助。就像电脑用户一样,你不需要知道如何组装处理器就能理解不同处理器型号之间的差异。这个比喻同样适用于逻辑逻辑模型,尽管评判标准可能有所不同,但仍然取决于一些基本因素。

While being a successful crafter of AI agents doesn’t require an in-depth understanding of LLMs, it’s helpful to be able to evaluate the specifications. Like a computer user, you don’t need to know how to build a processor to understand the differences in processor models. This analogy holds well for LLMs, and while the criteria may be different, it still depends on some primary considerations.

从我们之前的讨论和对 LM Studio 的了解中,我们可以提取出一些在考虑 LLM 时对我们很重要的基本标准。图 2.11 解释了定义 LLM 是否值得考虑用于创建 GPT 代理或任何 LLM 任务的基本标准。

From our previous discussion and look at LM Studio, we can extract some fundamental criteria that will be important to us when considering LLMs. Figure 2.11 explains the essential criteria to define what makes an LLM worth considering for creating a GPT agent or any LLM task.

数字
图 2.11消费 LLM 时需要考虑的重要标准

为了构建人工智能代理,我们需要结合任务来审视这些标准。模型上下文的大小和速度可以被视为第六和第七个标准,但它们通常被认为是模型部署架构和基础设施的不同变体。对于生命周期管理(LLM)而言,第八个需要考虑的标准是成本,但这取决于许多其他因素。以下总结了这些标准与构建人工智能代理之间的关系:

For our purposes of building AI agents, we need to look at each of these criteria in terms related to the task. Model context size and speed could be considered the sixth and seventh criteria, but they are usually considered variations of a model deployment architecture and infrastructure. An eighth criterion to consider for an LLM is cost, but this depends on many other factors. Here is a summary of how these criteria relate to building AI agents:

  • 模型性能 ——您通常需要了解LLM在特定任务集上的性能。例如,如果您正在构建一个专门用于编码的智能体,那么一个在编码方面表现良好的LLM至关重要。
  • Model performance —You’ll generally want to understand the LLM’s performance for a given set of tasks. For example, if you’re building an agent specific to coding, then an LLM that performs well on code will be essential.
  • 模型参数(大小)  ——模型的大小通常是衡量推理性能和模型响应速度的重要指标。然而,模型的大小也会决定您的硬件需求。如果您计划使用本地托管的模型,模型大小还将主要决定您所需的计算机和GPU。幸运的是,我们看到越来越多的小型、功能强大的开源模型不断涌现。
  • Model parameters (size) —The size of a model is often an excellent indication of inference performance and how well the model responds. However, the size of a model will also dictate your hardware requirements. If you plan to use your own locally hosted model, the model size will also primarily dictate the computer and GPU you need. Fortunately, we’re seeing small, very capable open source models being released regularly.
  • 用例(模型类型)  ——模型类型有多种。聊天补全模型(例如 ChatGPT)适用于迭代和推理问题,而补全模型、问答模型和指令模型等则更侧重于特定任务。聊天补全模型对于智能体应用至关重要,尤其是那些需要迭代的智能体应用。
  • Use case (model type) —The type of model has several variations. Chat completions models such as ChatGPT are effective for iterating and reasoning through a problem, whereas models such as completion, question/answer, and instruct are more related to specific tasks. A chat completions model is essential for agent applications, especially those that iterate.
  • 训练输入 ——理解用于训练模型的内容通常会决定模型的适用领域。虽然通用模型在各种任务中都能有效运行,但更具体或经过微调的模型可能更适合特定领域。对于特定领域的智能体而言,这一点尤为重要,因为规模更小、经过更精细调校的模型可能与 GPT-4 等大型模型性能相当甚至更优。
  • Training input —Understanding the content used to train a model will often dictate the domain of a model. While general models can be effective across tasks, more specific or fine-tuned models can be more relevant to a domain. This may be a consideration for a domain-specific agent where a smaller, more fine-tuned model may perform as well as or better than a larger model such as GPT-4.
  • 训练方法 ——这或许并非首要考虑因素,但了解模型训练方法仍然很有帮助。模型的训练方式会影响其泛化、推理和规划能力。这对于规划型智能体至关重要,但对于任务型助手而言,其重要性可能略逊一筹。
  • Training method —It’s perhaps less of a concern, but it can be helpful to understand what method was used to train a model. How a model is trained can affect its ability to generalize, reason, and plan. This can be essential for planning agents but perhaps less significant for agents than for a more task-specific assistant.
  • 上下文标记大小 ——模型的上下文大小与模型架构和类型密切相关。它决定了模型可以保存的上下文或内存大小。对于简单的任务,小于 4000 个标记的较小上下文窗口通常就足够了。但是,当使用多个代理共同处理同一任务时,较大的上下文窗口可能至关重要。模型通常会部署不同的上下文窗口大小。
  • Context token size —The context size of a model is more specific to the model architecture and type. It dictates the size of context or memory the model may hold. A smaller context window of less than 4,000 tokens is typically more than enough for simple tasks. However, a large context window can be essential when using multiple agents—all conversing over a task. The models will typically be deployed with variations on the context window size.
  • 模型速度(模型部署)  ——模型的速度取决于其推理速度(或模型响应请求的速度),而推理速度又取决于其运行的基础设施。如果您的代理不直接与用户交互,则可能不需要极高的实时速度。另一方面,实时交互的 LLM 代理需要尽可能快。对于商业模型,速度将由提供商决定和支持。对于希望运行 LLM 的用户,您的基础设施将决定其速度。
  • Model speed (model deployment) —The speed of a model is dictated by its inference speed (or how fast a model replies to a request), which in turn is dictated by the infrastructure it runs on. If your agent isn’t directly interacting with users, raw real-time speed may not be necessary. On the other hand, an LLM agent interacting in real time needs to be as quick as possible. For commercial models, speed will be determined and supported by the provider. Your infrastructure will determine the speed for those wanting to run their LLMs.
  • 模型成本(项目预算)  ——成本通常由项目决定。无论是学习构建代理还是实施企业软件,成本始终是一个需要考虑的因素。运行自己的生命周期模型 (LLM) 与使用商业 API 之间存在着显著的权衡。
  • Model cost (project budget) —The cost is often dictated by the project. Whether learning to build an agent or implementing enterprise software, cost is always a consideration. A significant tradeoff exists between running your LLMs versus using a commercial API.

在选择构建生产级智能体系统的模型时,有很多因素需要考虑。然而,通常来说,选择并使用单一模型进行研究和学习是最佳选择。如果您是语言学习模型(LLM)和智能体的新手,您可能需要选择商业模型,例如 GPT-4 Turbo。除非另有说明,本书中的工作将基于 GPT-4 Turbo。

There is a lot to consider when choosing which model you want to build a production agent system on. However, picking and working with a single model is usually best for research and learning purposes. If you’re new to LLMs and agents, you’ll likely want to choose a commercial option such as GPT-4 Turbo. Unless otherwise stated, the work in this book will depend on GPT-4 Turbo.

随着时间的推移,现有模型无疑会被更优秀的模型所取代。因此,您可能需要升级或更换模型。但要做到这一点,您必须了解您的 LLM 和代理的性能指标。幸运的是,在第 9 章中,我们将探讨如何使用提示流来评估 LLM、提示和代理配置文件。

Over time, models will undoubtedly be replaced by better models. So you may need to upgrade or swap out models. To do this, though, you must understand the performance metrics of your LLMs and agents. Fortunately, in chapter 9, we’ll explore evaluating LLMs, prompts, and agent profiles with prompt flow.

2.5练习

2.5 Exercises

请完成以下练习以帮助你更好地理解本章内容:

Use the following exercises to help you engage with the material in this chapter:

  • 练习 1 — 消费不同的 LLM
  • Exercise 1—Consuming Different LLMs

目标 ——使用connecting.py代码示例来使用来自 OpenAI 或其他提供商的不同 LLM。

Objective —Use the connecting.py code example to consume a different LLM from OpenAI or another provider.

任务

Tasks:

    • 修改connecting.py以连接到不同的LLM。
    • 选择 OpenAI 或其他提供商提供的 LLM 课程。
    • 更新代码中的 API 密钥和端点。
    • 执行修改后的代码并验证响应。
    • Modify connecting.py to connect to a different LLM.
    • Choose an LLM from OpenAI or another provider.
    • Update the API keys and endpoints in the code.
    • Execute the modified code and validate the response.
  • 练习 2——探索提示工程策略
  • Exercise 2—Exploring Prompt Engineering Tactics

目标 ——探索各种提示工程策略,并为每种策略创建变体。

Objective —Explore various prompt engineering tactics, and create variations for each.

任务:

Tasks:

    • 回顾本章介绍的快速工程策略。
    • 针对每种战术编写不同的变体,尝试不同的措辞和结构。
    • 使用 LLM 对各种变体进行测试,以观察不同的结果。
    • 记录结果,并分析每种变体的有效性。
    • Review the prompt engineering tactics covered in the chapter.
    • Write variations for each tactic, experimenting with different phrasing and structures.
    • Test the variations with an LLM to observe different outcomes.
    • Document the results, and analyze the effectiveness of each variation.
  • 练习 3 — 使用 LM Studio 下载并运行 LLM
  • Exercise 3—Downloading and Running an LLM with LM Studio

目标 ——使用 LM Studio 下载 LLM,并将其连接到提示工程策略。

Objective —Download an LLM using LM Studio, and connect it to prompt engineering tactics.

任务:

Tasks:

    • 在您的计算机上安装 LM Studio。
    • 使用 LM Studio 下载 LLM。
    • 使用 LM Studio 运行模型。
    • 编写Python代码连接到服务器端模型。
    • 将快速工程策略示例与服务模型相结合。
    • Install LM Studio on your machine.
    • Download an LLM using LM Studio.
    • Serve the model using LM Studio.
    • Write Python code to connect to the served model.
    • Integrate the prompt engineering tactics example with the served model.
  • 练习 4 —比较商业和开源 LLM
  • Exercise 4—Comparing Commercial and Open source LLMs

目标 ——通过简单的工程实例,比较商业 LLM(如 GPT-4 Turbo)与开源模型的性能。

Objective —Compare the performance of a commercial LLM such as GPT-4 Turbo with an open source model using prompt engineering examples.

任务:

Tasks:

    • 使用 GPT-4 Turbo 实现提示工程示例。
    • 使用开源LLM重复上述实现。
    • 根据响应准确性、一致性和速度等标准评估模型。
    • 记录评估过程,并总结评估结果。
    • Implement the prompt engineering examples using GPT-4 Turbo.
    • Repeat the implementation using an open source LLM.
    • Evaluate the models based on criteria such as response accuracy, coherence, and speed.
    • Document the evaluation process, and summarize the findings.
  • 练习 5 —LLM 的替代方案
  • Exercise 5—Hosting Alternatives for LLMs

目标 ——对比比较举办法学硕士课程与采用商业模式的各种替代方案。

Objective —Contrast and compare alternatives for hosting an LLM versus using a commercial model.

任务:

Tasks:

    • 研究LLM的不同托管方案(例如,本地服务器、云服务)。
    • 评估每种托管方案的优点和缺点。
    • 从成本、性能和易用性方面,将这些选项与使用商业模型进行比较。
    • 撰写一份报告,总结比较结果,并根据具体用例推荐最佳方法。
    • Research different hosting options for LLMs (e.g., local servers, cloud services).
    • Evaluate the benefits and drawbacks of each hosting option.
    • Compare these options to using a commercial model in terms of cost, performance, and ease of use.
    • Write a report summarizing the comparison and recommending the best approach based on specific use cases.

概括

Summary

  • LLM 使用一种称为生成式预训练转换器 (GPT) 的架构。
  • LLMs use a type of architecture called generative pretrained transformers (GPTs).
  • 生成模型(例如 LLM 和 GPT)与预测/分类模型的区别在于,它们学习如何表示数据,而不仅仅是对数据进行分类。
  • Generative models (e.g., LLMs and GPTs) differ from predictive/classification models by learning how to represent data and not simply classify it.
  • LLM 是针对特定用例的数据、架构和训练的集合,称为微调。
  • LLMs are a collection of data, architecture, and training for specific use cases, called fine-tuning.
  • OpenAI API SDK 可用于从 GPT-4 等模型连接到 LLM,也可用于使用开源 LLM。
  • The OpenAI API SDK can be used to connect to an LLM from models, such as GPT-4, and also used to consume open source LLMs.
  • 您可以快速设置 Python 环境并安装 LLM 集成所需的软件包。
  • You can quickly set up Python environments and install the necessary packages for LLM integration.
  • LLM 可以处理各种请求并生成独特的响应,这些响应可用于增强与 LLM 集成相关的编程技能。
  • LLMs can handle various requests and generate unique responses that can be used to enhance programming skills related to LLM integration.
  • 开源 LLM 是商业模型的替代方案,可以使用 LM Studio 等工具在本地托管。
  • Open source LLMs are an alternative to commercial models and can be hosted locally using tools such as LM Studio.
  • 提示工程是一系列有助于设计更有效的提示以改善 LLM 响应的技术。
  • Prompt engineering is a collection of techniques that help craft more effective prompts to improve LLM responses.
  • LLM 可用于驱动代理和助手,从简单的聊天机器人到功能齐全的自主工作者。
  • LLMs can be used to power agents and assistants, from simple chatbots to fully capable autonomous workers.
  • 针对特定需求选择最合适的LLM取决于性能、参数、用例、训练输入和其他标准。
  • Selecting the most suitable LLM for specific needs depends on the performance, parameters, use case, training input, and other criteria.
  • 在本地运行 LLM 需要各种技能,从设置 GPU 到了解各种配置选项。
  • Running LLMs locally requires a variety of skills, from setting up GPUs to understanding various configuration options.

3. 吸引 GPT 助手

3 Engaging GPT assistants

本章内容

This chapter covers

  • 隆重推出 OpenAI GPT Assistants 平台和 ChatGPT 用户界面
  • Introducing the OpenAI GPT Assistants platform and the ChatGPT UI
  • 构建一个能够利用代码解释能力的 GPT
  • Building a GPT that can use the code interpretation capabilities
  • 通过自定义操作扩展助手
  • Extending an assistant via custom actions
  • 通过文件上传向 GPT 添加知识
  • Adding knowledge to a GPT via file uploads
  • 将您的 GPT 作品商业化并发布到 GPT 商店
  • Commercializing your GPT and publishing it to the GPT Store

在探索 OpenAI 进军智能助手领域以及其最终目标是构建名为 GPT Assistants 的智能体平台的过程中,我们将首先通过 ChatGPT 界面介绍 GPT Assistants。然后,我们将添加几个功能完善的助手,它们可以根据食材推荐食谱,像数据科学家一样进行全面的数据分析,引导读者阅读书籍,并支持自定义操作。在本章结束时,我们将能够构建一个功能齐全的智能体,并将其发布到 OpenAI GPT Store。

As we explore the OpenAI crusade into assistants and what has been hinted at, ultimately, an agent platform called GPT Assistants, we’ll introduce GPT assistants through the ChatGPT interface. Then, we’ll add in several fully developed assistants that can suggest recipes from ingredients, fully analyze data as a data scientist, guide readers through books, and be extended with custom actions. By the end of the chapter, we’ll be ready to build a fully functional agent that can be published to the OpenAI GPT Store.

3.1通过 ChatGPT 探索 GPT 助手

3.1 Exploring GPT assistants through ChatGPT

ChatGPT(撰写本文时名为 ChatGPT Plus)允许您构建 GPT 助手、使用其他助手,甚至发布它们,您将在本章末尾看到相关内容。OpenAI 发布 GPT Assistants 平台,有助于定义和巩固 AI 代理的兴起。因此,对于任何对构建和使用代理系统感兴趣的人来说,它都值得认真研究。首先,我们将介绍如何通过 ChatGPT Plus 构建 GPT 助手,这需要付费订阅。如果您不想购买订阅,可以将本章作为入门指南,第 6 章将演示如何使用 API 服务。

ChatGPT (ChatGPT Plus, at the time of writing) allows you to build GPT assistants, consume other assistants, and even publish them, as you’ll see by the end of the chapter. When OpenAI announced the release of the GPT Assistants platform, it helped define and solidify the emergence of AI agents. As such, it’s worth a serious review by anyone interested in building and consuming agent systems. First, we’ll look at building GPT assistants through ChatGPT Plus, which requires a premium subscription. If you don’t want to purchase a subscription, browse this chapter as a primer, and chapter 6 will demonstrate consuming the API service later.

图 3.1 显示了 ChatGPT 中的 GPT 商店页面(https://chatgpt.com/gpts)。在这里,您可以搜索和浏览几乎适用于任何任务的各种 GPT。使用量通常可以反映每个 GPT 的效果,因此您可以根据自身需求来判断哪个 GPT 最适合您。

Figure 3.1 shows the page for the GPT Store within ChatGPT (https://chatgpt.com/gpts). From here, you can search and explore various GPTs for virtually any task. The amount of usage will typically indicate how well each GPT works, so gauge which works best for you.

数字
图 3.1 GPT 商店的主界面

创建您的第一个 GPT 助手非常简单,只需点击“创建”按钮,然后按照 GPT 构建器聊天界面的提示操作即可。图 3.2 展示了如何使用构建器创建 GPT。反复练习几次可以帮助您更好地了解助手的需求。

Creating your first GPT Assistant is as simple as clicking the Create button and following along with the GPT Builder chat interface. Figure 3.2 shows using the Builder to create a GPT. Working through this exercise a couple of times can be a great way to start understanding an assistant’s requirements.

数字
图 3.2与 GPT Builder 交互以创建助手

使用构建器后,您可以打开如图 3.3 所示的手动配置面板,直接编辑 GPT。您会看到名称、描述、说明和对话开场白等信息,这些信息均来自您与构建器的对话。这可以作为一个很好的起点,但通常情况下,您需要手动编辑和调整这些属性。

After working with the Builder, you can open the manual configuration panel, shown in figure 3.3, and edit the GPT directly. You’ll see the name, description, instructions, and conversation starters populated from your conversations with the Builder. This can be a great start, but generally, you’ll want to edit and tweak these properties manually.

数字
图 3.3 GPT 助手平台界面的配置面板

如果您想按照步骤构建自己的烹饪助手,请将清单 3.1 中的文本输入到说明中。这些说明部分内容是通过与构建者对话生成的,并根据明确的输出结果添加而成。这些明确的输出结果以规则的形式添加到说明中。

If you want to follow along with building your own Culinary Companion, enter the text from listing 3.1 into the instructions. These instructions were partly generated by conversing with the Builder and added based on explicit outputs. The explicit outputs are added to the instructions as rules.

清单 3.1烹饪伴侣使用说明
Culinary Companion 以友好、引人入胜的语气为用户提供帮助,
让人想起著名厨师茱莉亚·查尔德。    #1
它提供快速的膳食创意,并简化复杂的食谱,重点在于
用户已有的食材。此 GPT 强调实用、简单。
遵循烹饪建议并根据饮食偏好进行调整。
旨在使烹饪成为一种更容易上手、更愉快的体验
鼓励用户尝试不同的膳食,同时提供帮助
以热情友好的方式给予小费。    #2

规则:
制作食谱时,务必生成最终成品的图片。
食谱。                                                                  #3
制作食谱时,请估算卡路里和营养价值。
每份。                                                             
生成食谱时,请提供食材购物清单。
完成此食谱所需的大致价格。                          
生成食谱时,根据以下因素估算每份的总成本:
购物清单。
Culinary Companion assists users with a friendly, engaging tone, 
reminiscent of the famous chef Julia Child.     #1
It provides quick meal ideas and simplifies complex recipes, focusing on 
ingredients the user already has. This GPT emphasizes practical, easy-
to-follow culinary advice and adapts to dietary preferences. It's 
designed to make cooking a more accessible and enjoyable experience, 
encouraging users to experiment with their meals while offering helpful 
tips in a warm, approachable manner.     #2

RULES:
When generating a recipe, always create an image of the final prepared 
recipe.                                                                   #3
When generating a recipe, estimate the calories and nutritional values 
per serving.                                                             
When generating a recipe, provide a shopping list of ingredients with 
estimated prices needed to complete the recipe.                          
When generating a recipe, estimate the total cost per serving based on 
the shopping list.
#1 助手的个性或形象

#2 助手角色和目标的总体指导原则

#3 助手在推荐食谱时将遵循的一套规则

为助手/代理定义规则,本质上是为代理的输出结果创建一个模板。添加规则可以确保 GPT 的输出结果一致,并符合您对代理运行方式的预期。为代理/助手定义并赋予其角色,可以赋予其独特且令人印象深刻的个性。

Defining rules for an assistant/agent essentially creates a template for what the agent will produce. Adding rules ensures that the GPT output is consistent and aligned with your expectations of how the agent should operate. Defining and giving an agent/ assistant a persona provides them with a unique and memorable personality.

只需这几个步骤,我们就能拥有一个烹饪助手,它不仅能为我们提供手头食材的食谱,还能生成成品图片,估算营养价值,创建包含价格估算的购物清单,并分解每份的成本。

With just these few steps, we have a culinary companion that not only gives us recipes for ingredients we have on hand but also generates an image of the finished recipe, estimates the nutritional value, creates a shopping list with an estimate of prices, and breaks down the cost per serving.

您可以尝试使用助手,请求一份食谱并提供您拥有或偏好的食材清单。清单 3.2 展示了一个简单的请求示例,其中包含一些用于营造氛围的额外信息。当然,您可以添加任何您喜欢的食材或场景,然后查看结果。

Try the assistant by requesting a recipe and providing a list of ingredients you have or prefer. Listing 3.2 shows an example of a simple request with extra information to set the mood. Of course, you can add any ingredients or situations you like and then see the results.

清单 3.2提示食谱
我有一袋冷冻鸡柳条,我想做一道菜。
浪漫双人晚餐。
I have a bag of prepared frozen chicken strips and I want to make a 
romantic dinner for two.

图 3.4 显示了 GPT 根据提示生成的格式化输出结果。看起来确实很诱人。所有这些输出都是根据我们提供给代理的指令生成的。

Figure 3.4 shows the formatted output results from the GPT provided by the prompt. It certainly looks good enough to eat. All of this output was generated because of the instructions we provided the agent.

数字
图 3.4烹饪伴侣 GPT 的输出结果

虽然输出结果看起来不错,但并非所有结果都真实准确,您的实际结果可能会有所不同。例如,GPT 在我们已建议添加鸡柳食材的情况下,又将其添加到了购物清单中。此外,价格和预估营养信息仅供参考,如果您对此感兴趣,可以稍后进行调整。

While the output results look great, they may not all be factual and correct, and your results may vary. For instance, the GPT added chicken strips to the shopping list when we had already suggested having those ingredients. Furthermore, the prices and estimated nutritional information are just estimates, but this can be resolved later if they interest you.

不过,GPT Assistants 开箱即用,功能强大,能够快速构建概念验证助手或代理。正如您将在本章后面看到的那样,它还提供了一个优秀的平台,用于在 ChatGPT 之外使用助手。下一节,我们将介绍 GPT 提供的更多强大功能,例如文件上传和代码解析。

Out of the box, though, GPT Assistants is quite impressive for quickly building a proof-of-concept assistant or agent. As you’ll see later in the chapter, it also provides an excellent platform for consuming assistants outside ChatGPT. In the next section, we’ll look at more impressive features GPTs provide, such as file uploads and code interpretation.

3.2构建能够进行数据科学的 GPT

3.2 Building a GPT that can do data science

GPT Assistants平台已经并将继续扩展,以包含各种智能体组件。目前,GPT Assistants支持知识、记忆和动作。我们将在第8章详细讨论知识和记忆,并在第5章介绍通过动作使用工具的概念。

The GPT Assistants platform has and will likely be extended to include various agent components. Currently, GPT Assistants support what is referred to as knowledge, memory, and actions. In chapter 8, we’ll discuss the details of knowledge and memory, and in chapter 5, we cover the concept of tool use through actions.

在接下来的练习中,我们将构建一个助手,对我们提供的任何 CSV 文档进行初步的数据科学审查。该助手将使用允许编码和代码解析的功能或操作。启用代码解析后,助手默认允许文件上传。

In our next exercise, we’ll build an assistant to perform a first-pass data science review of any CSV document we provide. This agent will use the ability or action that allows for coding and code interpretation. When you enable code interpretation, the assistant will allow file uploads by default.

不过在此之前,我们需要先设计我们的智能体,而最好的方法莫过于让语言学习模型(LLM)为我们构建一个助手。清单 3.3 展示了请求 ChatGPT(GPT-4)设计数据科学助手的提示。请注意,我们并没有在单个提示中询问所有信息,而是逐步遍历语言学习模型返回的信息。

Before we do that, though, we want to design our agent, and what better way to do that than to ask an LLM to build us an assistant? Listing 3.3 shows the prompt requesting ChatGPT (GPT-4) to design a data science assistant. Notice how we’re not asking for everything in a single prompt but instead iterating over the information returned by the LLM.

清单 3.3提示需要数据科学助理
第一个提示:    
什么是好的、基础的、有趣的数据科学
实验中,你可以指派某人执行一项单一任务。
包含有趣数据的CSV文件?    #1
第二个提示:    
好的,现在你能把这些步骤都写成操作说明吗?
供 GPT 代理(LLM 代理)复制所有内容
以上步骤     #2

第三个提示:    
有哪些名人可以体现代理人的角色?
数据科学家,并且能够向用户展示数据?     #3
FIRST PROMPT:    
what is a good basic and interesting data science 
experiment you can task someone with a single 
csv file that contains interesting data?     #1
SECOND PROMPT:    
okay, can you now write all those steps into instructions 
to be used for a GPT Agent (LLM agent) to replicate all of 
the above steps      #2

THIRD PROMPT:    
What is a famous personality that can embody the agent 
data scientist and be able to present data to users?      #3
#1 首先,请LLM奠定基础。

#2 然后,请LLM将之前的步骤转化为更正式的流程。

#3 最后,请LLM提供一个能够代表该流程的人物形象。

那次对话的结果提供了清单 3.4 中所示的助手指令。在本例中,助手被命名为数据侦察员,但您可以随意为您的助手命名,只要您喜欢即可。

The result of that conversation provided for the assistant instructions shown in listing 3.4. In this case, the assistant was named Data Scout, but feel free to name your assistant what appeals to you.

清单 3.4数据侦察员说明
这个名为 Data Scout 的 GPT 旨在通过分析 CSV 文件来帮助用户。
文件并提供见解,例如著名统计学家内特·西尔弗 (Nate Silver) 的见解
因其平易近人且引人入胜的数据处理方式而备受赞誉。Data Scout 结合了
分析严谨,沟通方式清晰易懂
使复杂的数据洞察变得易于理解。它能够处理
统计检验、预测建模、数据可视化等等
基于可靠的数据,提出进一步探索的建议。
证据。

Data Scout 要求用户上传一个包含他们想要获取的数据的 csv 文件。
分析。用户上传文件后,您将执行以下操作:
任务:
数据采集
    请用户上传数据 csv 文件。
    说明:使用 pandas 库从 CSV 文件中读取数据。
文件。通过显示前几行来确保数据已正确加载。
使用 df.head()。

2. 探索性数据分析 (EDA)
数据清洗
    任务:识别并处理缺失值,纠正数据类型。
    说明:使用 df.isnull().sum() 检查缺失值。
对于分类数据,考虑用众数填充缺失值;对于其他数据,也应考虑用众数填充缺失值。
对于数值型数据,使用中位数或平均值。必要时转换数据类型。
使用 df.astype()。

可视化
    任务:创建可视化图表来探索数据。
    说明:使用 matplotlib 和 seaborn 创建直方图、散点图和箱线图。例如,使用 sns.histplot() 绘制直方图,
sns.scatterplot() 用于绘制散点图。

描述性统计
    任务:计算基本统计量。
    说明:使用 df.describe() 获取统计摘要;
df.mean()、df.median() 用于特定计算。

3. 假设检验
    任务:检验基于数据集提出的假设。
    说明:根据数据类型,执行统计检验
例如,可以使用 scipy.stats 进行 t 检验或卡方检验。例如,使用
stats.ttest_ind() 用于对两组数据进行 t 检验。

4. 预测建模
特征工程
    任务:用新特征增强数据集。
    说明:基于现有数据在 DataFrame 中创建新列
数据用于捕获其他信息或关系。使用操作
例如 df['new_feature'] = df['feature1'] / df['feature2']。

模型选择
    任务:选择并配置机器学习模型。
    说明:根据任务(分类或回归),选择
来自 scikit-learn 的模型,例如 RandomForestClassifier() 或
LinearRegression()。配置模型参数。

培训和测试
    任务:将数据拆分为训练集和测试集,然后训练模型。
    说明:使用 scikit-learn 中的 train_test_split 函数将数据集划分为训练集和测试集。
数据。使用 model.fit(X_train, y_train) 训练模型。

模型评估
    任务:评估模型性能。
    说明:使用均方误差(MSE)或准确率等指标。
使用 metrics.mean_squared_error(y_test, y_pred) 或
metrics.accuracy_score(y_test, y_pred)。

5. 见解与结论
    任务:解释并总结分析和建模的结果。
    说明:讨论模型系数或特征重要性。
对假设和预测分析得出结论。提出建议
根据结果​​采取的实际行动或措施。

6. 演示
    任务:准备一份报告或演示文稿。
    说明:请以清晰简洁的方式总结过程和发现。
使用图表和要点式,以易于理解的格式呈现。确保……
演示文稿通俗易懂,即使是非技术背景的利益相关者也能理解。
This GPT, named Data Scout, is designed to assist users by analyzing CSV 
files and providing insights like Nate Silver, a famous statistician known 
for his accessible and engaging approach to data. Data Scout combines 
rigorous analysis with a clear and approachable communication style, 
making complex data insights understandable. It is equipped to handle 
statistical testing, predictive modeling, data visualization, and more, 
offering suggestions for further exploration based on solid data-driven 
evidence.

Data Scout requires the user to upload a csv file of data they want to 
analyze. After the user uploads the file you will perform the following 
tasks:
Data Acquisition
    Ask the user to upload a csv file of data.
    Instructions: Use the pandas library to read the data from the CSV 
file. Ensure the data is correctly loaded by displaying the first few rows 
using df.head().

2. Exploratory Data Analysis (EDA)
Data Cleaning
    Task: Identify and handle missing values, correct data types.
    Instructions: Check for missing values using df.isnull().sum(). For 
categorical data, consider filling missing values with the mode, and for 
numerical data, use the median or mean. Convert data types if necessary 
using df.astype().

Visualization
    Task: Create visualizations to explore the data.
    Instructions: Use matplotlib and seaborn to create histograms, scatter plots, and box plots. For example, use sns.histplot() for histograms and 
sns.scatterplot() for scatter plots.

Descriptive Statistics
    Task: Calculate basic statistical measures.
    Instructions: Use df.describe() to get a summary of the statistics and 
df.mean(), df.median() for specific calculations.

3. Hypothesis Testing
    Task: Test a hypothesis formulated based on the dataset.
    Instructions: Depending on the data type, perform statistical tests 
like the t-test or chi-squared test using scipy.stats. For example, use 
stats.ttest_ind() for the t-test between two groups.

4. Predictive Modeling
Feature Engineering
    Task: Enhance the dataset with new features.
    Instructions: Create new columns in the DataFrame based on existing 
data to capture additional information or relationships. Use operations 
like df['new_feature'] = df['feature1'] / df['feature2'].

Model Selection
    Task: Choose and configure a machine learning model.
    Instructions: Based on the task (classification or regression), select 
a model from scikit-learn, like RandomForestClassifier() or 
LinearRegression(). Configure the model parameters.

Training and Testing
    Task: Split the data into training and testing sets, then train the model.
    Instructions: Use train_test_split from scikit-learn to divide the 
data. Train the model using model.fit(X_train, y_train).

Model Evaluation
    Task: Assess the model performance.
    Instructions: Use metrics like mean squared error (MSE) or accuracy. 
Calculate these using metrics.mean_squared_error(y_test, y_pred) or 
metrics.accuracy_score(y_test, y_pred).

5. Insights and Conclusions
    Task: Interpret and summarize the findings from the analysis and modeling.
    Instructions: Discuss the model coefficients or feature importances. 
Draw conclusions about the hypothesis and the predictive analysis. Suggest 
real-world implications or actions based on the results.

6. Presentation
    Task: Prepare a report or presentation.
    Instructions: Summarize the process and findings in a clear and 
accessible format, using plots and bullet points. Ensure that the 
presentation is understandable for non-technical stakeholders.

生成指令后,您可以将其复制并粘贴到图 3.5 中的“配置”面板。请务必选中相应的复选框,为助手启用“代码解释”工具(技能)。您无需在此处上传文件;启用“代码解释”复选框后,助手将允许上传文件。

After generating the instructions, you can copy and paste them into the Configure panel in figure 3.5. Be sure to give the assistant the Code Interpretation tool (skill) by selecting the corresponding checkbox. You don’t need to upload files here; the assistant will allow file uploads when the Code Interpretation checkbox is enabled.

数字
图 3.5启用代码解释器工具/技能

现在,我们可以通过上传一个 CSV 文件并询问相关问题来测试助手。本章的源代码文件夹中包含一个名为 `<filename>` 的文件netflix_titles.csv;其前几行内容已在清单 3.5 中进行了总结。当然,您可以使用任何您想要的 CSV 文件,但本练习将使用 Netflix 示例。请注意,此数据集是从 Kaggle 下载的,但如果您愿意,也可以使用任何其他 CSV 文件。

Now, we can test the assistant by uploading a CSV file and asking questions about it. The source code folder for this chapter contains a file called netflix_titles.csv; the top few rows are summarized in listing 3.5. Of course, you can use any CSV file you want, but this exercise will use the Netflix example. Note that this dataset was downloaded from Kaggle, but you can use any other CSV if you prefer.

列表 3.5 netflix_titles.csv(数据的第一行)
show_id、type、title、director、cast、country、date_added、
发行年份、评分、片长、上映片单、简介    #1
s1,电影,迪克·约翰逊之死,克斯汀·约翰逊,
美国,2021年9月25日,2020,PG-13,90分钟,
纪录片《在她父亲生命的尽头》,
电影制作人克尔斯滕·约翰逊以极富创意的方式重现了他的死亡
以及一些滑稽的方式来帮助他们面对不可避免的结局。    #2
show_id,type,title,director,cast,country,date_added,
release_year,rating,duration,listed_in,description     #1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,, 
United States,"September 25, 2021",2020,PG-13,90 min,
Documentaries,"As her father nears the end of his life, 
filmmaker Kirsten Johnson stages his death in inventive 
and comical ways to help them both face the inevitable."     #2
#1 以逗号分隔的列列表

#2 数据集中的一行示例数据

我们可以上传文件并让助手执行操作,但为了便于演示,我们将采取更具体的操作方式。清单 3.6 展示了如何提示和上传文件以启动助手(包括Netflix_titles.csv在请求中)。此示例将结果筛选为加拿大,但您当然可以查看任何国家/地区的数据。

We could upload the file and ask the assistant to do its thing, but for this exercise, we’ll be more specific. Listing 3.6 shows the prompt and uploading the file to engage the assistant (including Netflix_titles.csv in the request). This example filters the results to Canada, but you can, of course, use any country you want to view.

清单 3.6提示数据侦察员
分析附件中的 CSV 文件,并将结果筛选为:
加拿大及其产出的任何重大发现
趋势等方面    #1
Analyze the attached CSV and filter the results to the 
country Canada and output any significant discoveries 
in trends etc.     #1
#1 您可以选择不同的国家/地区来筛选数据。

如果助手解析文件时遇到问题,请刷新浏览器窗口并重试。根据您的数据和筛选条件,助手现在将像数据科学家一样使用代码解释器来分析数据并提取趋势。

If you encounter problems with the assistant parsing the file, refresh your browser window and try again. Depending on your data and filter, the assistant will now use the Code Interpreter as a data scientist would to analyze and extract trends in the data.

图 3.6 显示了使用数据文件对清单 3.5 中的提示生成的输出netflix_titles.csv。如果您选择不同的国家/地区或请求其他分析,则输出结果可能大相径庭。

Figure 3.6 shows the output generated for the prompt in listing 3.5 using the netflix_titles.csv file for data. Your output may look quite different if you select a different country or request another analysis.

数字
图 3.6助手分析 CSV 数据后生成的输出

该助手生成的数据科学图表是通过编写和执行代码解释器生成的。您可以尝试使用其他 CSV 文件,或者根据需要分析其他类型的数据。您甚至可以继续使用该助手来更新图表,以直观地呈现图表内容或分析其他趋势。

The data science plots the assistant is building are created by writing and executing code with the Code Interpreter. You can try this with other CSV files or, if you want, different forms of data to analyze. You could even continue iterating with the assistant to update the plots visually or analyze other trends.

代码解析是一项非常重要的技能,您很可能会将其添加到许多智能体中,用于执行从计算到自定义格式等各种操作。下一节,我们将探讨如何通过自定义操作扩展 GPT 的功能。

Code interpretation is a compelling skill that you’ll likely add to many of your agents for everything from calculations to custom formatting. In the next section, we look at how to extend the capabilities of a GPT through custom actions.

3.3自定义 GPT 并添加自定义操作

3.3 Customizing a GPT and adding custom actions

在接下来的练习中,我们将演示如何使用自定义操作,它可以显著扩展助手的功能。向代理添加自定义操作需要多个步骤,从理解 OpenAPI 规范端点到连接到服务。因此,在添加自定义操作之前,我们将在下一节中构建另一个 GPT 来辅助我们。

In our next exercise, we’ll demonstrate the use of custom actions, which can significantly extend the reach of your assistant. Adding custom actions to an agent requires several components, from understanding the OpenAPI specification endpoint to connecting to a service. Therefore, before we add custom actions, we’ll build another GPT in the next section to assist us.

3.3.1创建助手以构建助手

3.3.1 Creating an assistant to build an assistant

鉴于 GPT 的强大功能,我们理所当然地会使用一个 GPT 来辅助构建其他 GPT。在本节中,我们将构建一个 GPT,它可以帮助我们创建一个服务,并将该服务作为自定义操作连接到另一个 GPT。没错,我们甚至会使用 LLM 来开始构建我们的辅助 GPT。

Given GPTs’ capabilities, it only makes sense that we use one to assist in building others. In this section, we’ll build a GPT that can help us create a service we can connect as a custom action to another GPT. And yes, we’ll even use an LLM to begin constructing our helper GPT.

以下列表显示了创建辅助程序 GPT 指令的提示。此提示旨在生成辅助程序的指令。

The following listing shows the prompt for creating the instructions for our helper GPT. This prompt is intended to generate the instructions for the assistant.

列表 3.7提示助手设计(在 GPT Builder 或 ChatGPT 中)
我想创建一个 GPT 助手,它可以生成一个 FastAPI 服务,
将执行一些待指定的操作。作为 FastAPI 代码的一部分
我希望助手能够生成 OpenAPI 规范。
端点。请为该代理编写一套操作说明。
I want to create a GPT assistant that can generate a FastAPI service that 
will perform some action to be specified. As part of the FastAPI code 
generation, I want the assistant to generate the OpenAPI specification for 
the endpoint. Please outline a set of instructions for this agent.

清单 3.8 显示了针对该提示生成的大部分指令。输出结果随后经过修改和少量更新,添加了具体信息和其他细节。请将这些指令从文件(assistant_builder.txt)复制并粘贴到您的 GPT 中。同时,请务必启用代码解释器功能。

Listing 3.8 shows the bulk of the instructions generated for the prompt. The output was then modified and slightly updated with specific information and other details. Copy and paste those instructions from the file (assistant_builder.txt) into your GPT. Be sure to select the Code Interpreter capability also.

清单 3.8自定义操作助手说明
此 GPT 旨在帮助用户生成 FastAPI 服务
针对特定操作量身定制,并包含相应的 OpenAPI
端点规格说明。助手将提供代码片段。
以及关于如何使用 FastAPI 构建和记录 API 服务的指导,
确保生成的服务已准备好进行集成;
部署。

1. 定义操作和终点:首先,确定具体操作。
FastAPI 服务应该执行的操作。这可能包括获取数据等任何操作。
数据、处理信息或与其他 API 或数据库交互。

2. 设计 API 端点:确定 HTTP 方法(GET、POST、PUT、
DELETE 等)以及端点 URI 结构。定义输入参数
(路径、查询或主体参数)以及预期响应结构。

3. 生成 FastAPI 代码:
        安装 FastAPI:导入 FastAPI 和其他必要的库。
        创建 API 函数:编写一个 Python 函数来执行以下操作
期望执行的操作。此函数应接受定义的输入参数。
并返回相应的响应。
4. 装饰函数:使用 FastAPI 的装饰器(例如,
使用 `@app.get("/endpoint"))` 将函数与指定的端点关联起来
以及HTTP方法。
        定义输入和输出模型:使用 Pydantic 模型定义
输入和输出数据的结构。这确保了验证和
数据序列化。

5. 生成 OpenAPI 规范:
        FastAPI 会根据以下内容自动生成 OpenAPI 规范:
端点定义和 Pydantic 模型。确保所有功能都正常运行。
参数和模型均使用文档字符串和字段进行了详细说明。
描述。
        (可选)通过添加以下内容来自定义 OpenAPI 规范
直接在 FastAPI 装饰器中输入元数据、标签或其他响应。

6. 部署:
        向用户说明如何准备 FastAPI 应用程序
部署。
        指导他们如何使用 ngrok 进行部署
在用户本地计算机上提供服务和托管。     #1
This GPT is designed to assist users in generating FastAPI services 
tailored to specific actions, complete with the corresponding OpenAPI 
specifications for the endpoints. The assistant will provide code snippets 
and guidance on structuring and documenting API services using FastAPI, 
ensuring that the generated services are ready for integration and 
deployment.

1.   Define the Action and Endpoint: First, determine the specific action 
the FastAPI service should perform. This could be anything from fetching 
data, processing information, or interacting with other APIs or databases.

2.    Design the API Endpoint: Decide on the HTTP method (GET, POST, PUT, 
DELETE, etc.) and the endpoint URI structure. Define the input parameters 
(path, query, or body parameters) and the expected response structure.

3. Generate FastAPI Code:
        Setup FastAPI: Import FastAPI and other necessary libraries.
        Create API Function: Write a Python function that performs the 
desired action. This function should accept the defined input parameters 
and return the appropriate response.
4. Decorate the Function: Use FastAPI's decorators (e.g., 
@app.get("/endpoint")) to link the function with the specified endpoint 
and HTTP method.
        Define Input and Output Models: Use Pydantic models to define the 
structure of the input and output data. This ensures validation and 
serialization of the data.

5. Generate OpenAPI Specification:
        FastAPI automatically generates the OpenAPI specification based on 
the endpoint definitions and Pydantic models. Ensure that all function 
parameters and models are well-documented using docstrings and field 
descriptions.
        Optionally, customize the OpenAPI specification by adding 
metadata, tags, or additional responses directly in the FastAPI decorators.

6. Deployment:
        Describe to the user how to prepare the FastAPI application for 
deployment. 
        Instruct them on how to use ngrok to deploy the 
service and host it on the user's local machine.      #1
#1 本示例使用 ngrok 在本地部署服务。

完成助手设置后,请确保“配置”面板中的所有选项都已设置完毕(包括勾选“代码解释器”复选框),然后刷新浏览器窗口。这将使助手准备好迎接新的会话。您可以在此处请求要构建的服务类型。

After preparing the assistant, ensure everything is set in the Configure panel (including setting the Code Interpreter checkbox), and then refresh your browser window. This will prepare the assistant for a new session. You can request the kind of service you want to build from here.

清单 3.9 展示了向自定义操作助手发出的创建每日任务端点的请求。如果您了解 API 的工作原理,可以提出其他选项,例如POST……。当然,您也可以让助手指导您创建服务。

Listing 3.9 shows the request to the Custom Action Assistant to create a daily task endpoint. If you understand how APIs work, you can suggest other options, such as POST. Of course, you can also ask the assistant to guide you and create your service.

列表 3.9提示请求任务端点服务
我想定义一个 GET 请求端点,用于返回我的每日任务列表。
I want to define a GET endpoint that replies with my list of daily tasks

输入提示后,助手将生成用于创建和运行 FastAPI 端点的代码和说明。以下代码示例展示了根据上一个请求生成的代码。

After you enter the prompt, the assistant will generate the code and instructions for creating and running the FastAPI endpoint. The following listing shows an example of the code generated from the previous request.

清单 3.10 daily_tasks_api.py(由助手生成)
from fastapi import FastAPI
from pydantic import BaseModel
从 typing 导入 List

app = FastAPI()

class Task(BaseModel):   #1
    id:整数
    描述:字符串
    已完成:布尔值

任务 = [    #2
    Task(id=1, description="购买杂货", completed=False),
    Task(id=2, description="阅读一本书", completed=True),
    Task(id=3, description="完成 FastAPI 项目", completed=False),
]

@app.get("/tasks", response_model=List[Task])   #3
异步定义 get_tasks():
    """
    获取每日任务列表。
    """
    返回任务
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List

app = FastAPI()

class Task(BaseModel):    #1
    id: int
    description: str
    completed: bool

tasks = [     #2
    Task(id=1, description="Buy groceries", completed=False),
    Task(id=2, description="Read a book", completed=True),
    Task(id=3, description="Complete FastAPI project", completed=False),
]

@app.get("/tasks", response_model=List[Task])    #3
async def get_tasks():
    """
    Retrieve a list of daily tasks.
    """
    return tasks
#1 使用 Pydantic 为任务创建类型。

#2 这是一个用于演示的静态任务列表。

#3 任务端点

将代码输入到 Visual Studio Code (VS Code) 中,并确认fastapiuvicorn安装 和pip。然后,使用以下列表中显示的命令运行 API,该命令会运行章节源文件中的 API。

Enter the code into Visual Studio Code (VS Code), and confirm that fastapi and uvicorn are installed with pip. Then, run the API using the command shown in the following listing, which runs the API in the chapter source file.

清单 3.11运行 API
uvicorn daily_tasks_api:app –reload     #1
uvicorn daily_tasks_api:app –reload      #1
#1 如果您使用的是其他名称,请更改模块/文件的名称。

打开浏览器访问 http://127.0.0.1:8000/docs,这是 Swagger 端点的默认位置,如图 3.7 所示。

Open a browser to http://127.0.0.1:8000/docs, the default location for the Swagger endpoint, as shown in figure 3.7.

数字
图 3.7浏览 Swagger 文档并获取 openapi.json 文档

点击/openapi.json链接将显示该端点的 OpenAPI 规范,如清单 3.12 所示(JSON 已转换为 YAML)。您需要复制并保存此文档,以便在代理上设置自定义操作时使用。该端点生成 JSON 数据,但您也可以使用 YAML 编写的规范。

Clicking the /openapi.json link will display the OpenAPI specification for the endpoint, as shown in listing 3.12 (JSON converted to YAML). You’ll need to copy and save this document for later use when setting up the custom action on the agent. The endpoint produces JSON, but you can also use specifications written in YAML.

清单 3.12任务 API 的 OpenAPI 规范
openapi:3.1.0
信息:
  标题:FastAPI
  版本:0.1.0
路径:
  /任务:
    得到:
      摘要:获取任务
      描述:获取每日任务列表。
      operationId: get_tasks_tasks_get
      回复:
        '200':
          描述:成功响应
          内容:
            application/json:
              模式:
                类型:数组
                项目:
                  $ref: '#/components/schemas/Task'
                标题:响应获取任务获取
成分:
  模式:
    任务:
      类型:对象
      特性:
        ID:
          类型:整数
          标题:ID
        描述:
          类型:字符串
          标题:描述
        完全的:
          类型:布尔值
          标题:已完成
      必需的:
        - ID
        - 描述
        - 完全的
      标题:任务
openapi: 3.1.0
info:
  title: FastAPI
  version: 0.1.0
paths:
  /tasks:
    get:
      summary: Get Tasks
      description: Retrieve a list of daily tasks.
      operationId: get_tasks_tasks_get
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/Task'
                title: Response Get Tasks Tasks Get
components:
  schemas:
    Task:
      type: object
      properties:
        id:
          type: integer
          title: Id
        description:
          type: string
          title: Description
        completed:
          type: boolean
          title: Completed
      required:
        - id
        - description
        - completed
      title: Task

在将助手连接到服务之前,必须设置并使用 ngrok 打开到运行该服务的本地计算机的隧道。提示 GPT 提供说明并帮助您设置 ngrok,然后运行该应用程序以在您的计算机上打开端口 8000 的端点,如清单 3.13 所示。如果您更改端口或使用不同的配置,则必须相应地更新它。

Before connecting an assistant to the service, you must set up and use ngrok to open a tunnel to your local machine running the service. Prompt the GPT to provide the instructions and help you set up ngrok, and run the application to open an endpoint to port 8000 on your machine, as shown in listing 3.13. If you change the port or use a different configuration, you must update it accordingly.

清单 3.13运行 ngrok(按照设置说明进行操作)
./ngrok authtoken <YOUR_AUTHTOKEN>     #1
./ngrok http 8000     #2
./ngrok authtoken <YOUR_AUTHTOKEN>      #1
./ngrok http 8000      #2
#1 输入您从 ngrok.com 获取的身份验证令牌。

#2 打开端口 8000 上的隧道以允许外部互联网流量。

运行 ngrok 后,您会看到一个外部 URL,现在您可以使用该 URL 访问计算机上的服务。请复制此 URL,以便在设置助手时使用。下一节中,我们将创建一个助手,该助手会将此服务作为自定义操作使用。

After you run ngrok, you’ll see an external URL that you can now use to access the service on your machine. Copy this URL for later use when setting up the assistant. In the next section, we’ll create the assistant that consumes this service as a custom action.

3.3.2将自定义操作连接到助手

3.3.2 Connecting the custom action to an assistant

在您的机器上启动并运行该服务,并且可以通过 ngrok 隧道从外部访问后,我们就可以构建新的助手了。这次,我们将创建一个简单的助手来帮助我们组织日常任务,这些任务可以从我们本地运行的任务服务中访问。

With the service up and running on your machine and accessible externally via the ngrok tunnel, we can build the new assistant. This time, we’ll create a simple assistant to help us organize our daily tasks, where the tasks will be accessible from our locally running task service.

打开 GPT 界面和配置面板,将清单 3.14 中的说明复制粘贴到新建的助手程序中。务必为助手程序命名并添加易于理解的描述。此外,启用代码解释器功能,以便助手程序能够生成最终的图表,显示各项任务。

Open the GPT interface and the Configure panel, and copy and paste the instructions shown in listing 3.14 into the new assistant. Be sure to name the assistant and enter a helpful description as well. Also, turn on the Code Interpreter capability to allow the assistant to create the final plot, showing the tasks.

清单 3.14任务组织器(task_organizer_assistant.txt
任务管理器旨在帮助用户安排日常任务的优先级。
根据紧急程度和时间安排,提供结构化的指导
如何按紧急程度对任务进行分类,并建议最佳时间段
完成这些任务。它采用了一种受蒂姆·费里斯启发的人物形象,众所周知
因为他注重生产力和效率。他使用了清晰、直接的语言。
语言表达清晰,避免对用户的空闲时间做出假设。
完成任务组织后,创建一个图表
显示任务完成的时间和方式。     #1
Task Organizer is designed to help the user prioritize their daily tasks 
based on urgency and time availability, providing structured guidance on 
how to categorize tasks by urgency and suggesting optimal time blocks for 
completing these tasks. It adopts a persona inspired by Tim Ferriss, known 
for his focus on productivity and efficiency. It uses clear, direct 
language and avoids making assumptions about the user's free time.
When you are done organizing the tasks create a plot 
showing when and how the tasks will be completed.      #1
#1 此功能需要启用代码解释器。

单击面板底部的“创建新操作”按钮。图 3.8 显示了添加自定义操作的界面。您必须将服务的 OpenAPI 规范复制并粘贴到窗口中。然后,您必须添加一个名为“自定义操作”的新部分servers,并在其中填充您的 URL,如图所示。

Click the Create New Action button at the bottom of the panel. Figure 3.8 shows the interface for adding a custom action. You must copy and paste the OpenAPI specification for your service into the window. Then, you must add a new section called servers and populate that with your URL, as shown in the figure.

数字
图 3.8添加新的自定义操作

设置好规格后,您可以点击“测试”按钮进行测试。这将运行测试,您将在对话框窗口中看到结果,如图 3.9 所示。

After the specification is set, you can test it by clicking the Test button. This will run a test, and you’ll see the results shown in the conversation window, as shown in figure 3.9.

数字
图 3.9测试 API 服务端点是否已正确配置为自定义操作

一切就绪后,刷新浏览器窗口以重置会话,然后输入类似清单 3.15 中所示的提示信息。这将提示代理调用服务以获取您的每日任务,汇总输出结果,并解决您的任务组织难题。

After you’re satisfied, everything is set. Refresh your browser window to reset the session, and enter something like the prompt shown in listing 3.15. This will prompt the agent to call the service to get your daily tasks, summarize the output, and solve your task organization dilemma.

清单 3.15任务组织器提示
我今天应该如何安排我的任务?
how should I organize my tasks for today?

助手最终应该生成任务进度图。如果生成错误或格式不符合您的要求,您可以添加指令来指定助手应输出的格式/样式。

The assistant should produce a plot of the task schedule at the end. If it gets this wrong or the formatting isn’t what you prefer, you can add instructions to specify the format/style the assistant should output.

您可以改进服务,但如果您对 API 进行任何更改,则需要更新助手自定义操作中的规范。不过,您可以在此处添加从您的计算机运行或作为服务托管的自定义操作服务。

You can improve the service, but if you make any changes to the API, the specification in the assistant custom actions will need to be updated. From here, though, you can add custom action services run from your computer or hosted as a service.

自定义操作是为助手添加动态功能的绝佳方式,无论用于个人用途还是商业用途。文件上传则更适合为助手提供静态知识。下一节将探讨如何使用文件上传来扩展助手的知识库。

Custom actions are a great way to add dynamic functionality to an assistant, whether for personal or commercial use. File uploads are a better option for providing an assistant with static knowledge. The next section will explore using file uploads to extend an assistant’s knowledge.

3.4利用文件上传扩展助手的知识

3.4 Extending an assistant’s knowledge using file uploads

如果您接触过学习型记忆管理(LLM),您可能听说过检索增强生成(RAG)模式。第 8 章将详细探讨 RAG 在知识和记忆应用方面的应用。使用文件上传功能并不需要深入了解 RAG,但如果您需要一些基础知识,请参阅第 8 章。

If you’ve engaged with LLMs, you likely have heard about the retrieval augmented generation (RAG) pattern. Chapter 8 will explore RAG in detail for the application of both knowledge and memory. Detailed knowledge of RAG isn’t required to use the file upload capability, but if you need some foundation, check out that chapter.

GPT Assistants 平台提供了一项名为“文件上传”的知识库功能,允许您以各种格式向 GPT 填充静态知识库,内容涵盖任何主题。截至撰写本文时,GPT Assistants 平台允许您上传最大 512 MB 的文档。在接下来的两个练习中,我们将介绍两个不同的 GPT,它们旨在帮助用户阅读书籍。

The GPT Assistants platform provides a knowledge capability called file uploads, which allows you to populate the GPT with a static knowledge base about anything in various formats. As of writing, the GPT Assistants platform allows you to upload up to 512 MB of documents. In the next two exercises, we’ll look at two different GPTs designed to assist users with consuming books.

3.4.1构建简易微积分 GPT

3.4.1 Building the Calculus Made Easy GPT

书籍和书面知识始终是我们知识体系的基石。但阅读文字需要投入大量时间和精力,很多人并没有这样的时间。有声书让阅读书籍再次变得触手可及;你可以一边做其他事一边听,但并非所有书籍都适合改编成有声书。

Books and written knowledge will always be the backbone of our knowledge base. But reading text is a full-time concerted effort many people don’t have time for. Audiobooks made consuming books again accessible; you could listen while multitasking, but not all books transitioned well to audio.

欢迎来到人工智能和智能助手的世界。借助GPT(谷歌图形心理学),我们可以创造读者与书籍之间的互动体验。读者不再需要逐页阅读,而是可以整体地阅读。

Enter the world of AI and intelligent assistants. With GPTs, we can create an interactive experience between the reader and the book. No longer is the reader forced to consume a book page by page but rather as a whole.

为了演示这个概念,我们将基于西尔瓦努斯·P·汤普森(Silvanus P. Thompson)的经典数学教材《微积分入门》(Calculus Made Easy)构建一个GPT模型。这本书可以通过古腾堡出版社网站免费获取。虽然它已有一百多年的历史,但仍然提供了扎实的知识基础。

To demonstrate this concept, we’ll build a GPT based on a classic math text called Calculus Made Easy, by Silvanus P. Thompson. The book is freely available through the Gutenberg Press website. While it’s more than a hundred years old, it still provides a solid material background.

打开 ChatGPT,转到“我的 GPT”,创建一个新的 GPT,点击“配置”选项卡,然后上传文件,如图 3.10 所示。从章节源代码文件夹上传书籍:chapter _03/calculus_made_easy.pdf。这将把书籍添加到 GPT 的知识库中。

Open ChatGPT, go to My GPTs, create a new GPT, click the Configure tab, and then upload the file, as shown in figure 3.10. Upload the book from the chapter’s source code folder: chapter _03/calculus_made_easy.pdf. This will add the book to the GPT’s knowledge.

数字
图 3.10将文件添加到助手的知识库中

向上滚动并添加清单 3.16 中所示的指令。初始序言文本是通过与 GPT Builder 对话生成的。更新序言文本后,我们通过向 ChatGPT 询问著名数学家来添加个性特征。最后,我们添加了规则,为 GPT 提供关于我们想要哪些明确结果的额外指导。

Scroll up and add the instructions shown in listing 3.16. The initial preamble text was generated by conversing with the GPT Builder. After updating the preamble text, a personality was added by asking ChatGPT for famous mathematicians. Then, finally, rules were added to provide additional guidance to the GPT on what explicit outcomes we want.

列表 3.16微积分简易教程 GPT
这个GPT旨在培养一位专业的教师和导师。
微积分课程基于《轻松微积分》一书
西尔瓦努斯·汤普森。本书副本已上传至……
calculus_made_easy.pdf 提供了详细的指导。
以及对各种微积分主题的解释,例如
导数、积分、极限等等。GPT 可以
教授微积分概念,解决问题,并回答问题
与微积分相关的问题,使复杂主题变得复杂
易于理解和使用。它可以处理
从基础到高级的微积分相关问题
对学生和教育工作者尤其有用
 旨在加深他们对微积分的理解。     #1
请以著名数学家陶哲轩的视角来回答。
陶哲轩以其卓越的才智而闻名。
平易近人,以及卓越的有效沟通能力
 简化和传达复杂的数学概念。    #2

规则    #3
1)要像教小孩子一样讲解概念。
2)始终通过展示函数图像和图表来演示概念。
3)务必询问用户是否想自己尝试一下示例问题。
给他们出一个与你刚才讨论的问题概念相对应的问题。
This GPT is designed to be an expert teacher and mentor 
of calculus based on the book 'Calculus Made Easy' by 
Silvanus Thompson. A copy of the book is uploaded at 
calculus_made_easy.pdf and provides detailed guidance 
and explanations on various calculus topics such as 
derivatives, integrals, limits, and more. The GPT can 
teach calculus concepts, solve problems, and answer 
questions related to calculus, making complex topics 
accessible and understandable. It can handle 
calculus-related inquiries, from basic to advanced, 
and is particularly useful for students and educators
 seeking to deepen their understanding of calculus.      #1
Answer as the famous mathematician Terence Tao. 
Terence Tao is renowned for his brilliant intellect, 
approachability, and exceptional ability to effectively
 simplify and communicate complex mathematical concepts.     #2

RULES     #3
1) Always teach the concepts as if you were teaching to a young child.
2) Always demonstrate concepts by showing plots of functions and graphs.
3) Always ask if the user wants to try a sample problem on their own. 
Give them a problem equivalent to the question concept you were discussing.
#1 前言最初由构建器生成,然后根据需要进行调整。

#2 务必始终为您的助手和代理赋予合适的角色/个性。

#3 定义明确的条件和规则有助于更好地引导 GPT 按照您的意愿运行。

更新助手后,您可以在预览窗口中试用,或者在 GPT 应用商店搜索“Calculus Made Easy”来获取书籍版本。图 3.11 展示了与 GPT 交互的一个示例片段。该图表明,GPT 可以生成图表来演示概念或提出问题。

After updating the assistant, you can try it in the preview window or the book version by searching for Calculus Made Easy in the GPT Store. Figure 3.11 shows a snipped example of interaction with the GPT. The figure shows that the GPT can generate plots to demonstrate concepts or ask questions.

数字
图 3.11让 GPT 教授微积分的输出结果

这个GPT演示了助教如何利用书籍作为教学参考资料。本次演示仅上传了一本书,但实际上可以上传多本书或其他文档。随着这项功能和技术的日趋成熟,未来或许可以设想使用GPT来完成整个课程的教学。

This GPT demonstrates the ability of an assistant to use a book as a companion teaching reference. Only a single book was uploaded in this exercise, but multiple books or other documents could be uploaded. As this feature and the technology mature, in the future, it may be conceivable that an entire course could be taught using a GPT.

我们将抛开技术细节,转而运用虚构故事来展示知识的应用。下一节,我们将探讨如何利用文件上传方面的知识进行搜索和参考。

We’ll move away from technical and embrace fiction to demonstrate the use of knowledge. In the next section, we’ll look at how knowledge of file uploads can be used for search and reference.

3.4.2知识搜索及更多功能,支持文件上传

3.4.2 Knowledge search and more with file uploads

GPT Assistants平台的文件上传功能支持单个助手上传高达512MB的文件。仅此一项功能就为个人用户和中小型企业/项目规模的用户提供了强大的文档搜索和其他应用能力。

The GPT Assistants platform’s file upload capability supports up to 512 MB of uploads for a single assistant. This feature alone provides powerful capabilities for document search and other applications in personal and small-to-medium business/ project sizes.

想象一下,上传一整套文件。现在,您只需一个助手即可完成搜索、比较、对比、整理和归档等所有操作。仅 GPT 助手的这一功能就足以颠覆我们搜索和分析文档的方式。在第六章中,我们将探讨如何通过直接访问 OpenAI 助手 API 来增加文档数量。

Imagine uploading a whole collection of files. You can now search, compare, contrast, organize, and collate all with one assistant. This feature alone within GPT Assistants will disrupt how we search for and analyze documents. In chapter 6, we’ll examine how direct access to the OpenAI Assistants API can increase the number of documents.

接下来,我们将使用一个熟悉多本书籍或文档的助手。虽然这项技术可以应用于任何支持的文档,但这个助手将专门阅读关于机器人的经典文本。我们将这个助手命名为“经典机器人阅读GPT”。

For this next exercise, we’ll employ an assistant with knowledge of multiple books or documents. This technique could be applied to any supported document, but this assistant will consume classic texts about robots. We’ll name this assistant the Classic Robot Reads GPT.

首先在 ChatGPT 界面创建一个新的 GPT 助手。然后,上传清单 3.17 中的指令,并为助手命名和描述。这些指令部分由 GPT Builder 生成,并经过编辑。

Start by creating a new GPT assistant in the ChatGPT interface. Then, upload the instructions in listing 3.17, and name and describe the assistant. These instructions were generated in part through the GPT Builder and then edited.

示例 3.17经典机器人阅读说明
这个 GPT(经典机器人程序)会阅读并使用以下角色:
艾萨克·阿西莫夫将以著名机器人作家的身份作答。    #1
这个GPT只会引用和讨论这些书籍。
在其上传文件的知识库中。                  #2
它没有提及或讨论其他书籍或文本。
不在其知识库范围内。#2

规则
仅参考知识库#2中的文本    
对于用户提出的任何查询,务必提供 3 个示例。    #3
务必询问用户是否还需要其他帮助。     #4
This GPT, Classic Robot Reads and uses the persona of 
Isaac Asimov and will reply as the famous robot author.     #1
This GPT will only references and discusses the books 
in its knowledge base of uploaded files.                   #2
It does not mention or discuss other books or text that 
are not within its knowledge base.                        #2

RULES
Refer to only text within your knowledge base         #2    
Always provide 3 examples of any query the use asks for     #3
Always ask the user if they require anything further      #4
#1 务必为您的 Google 个性化工具 (GPT) 赋予个性/角色。

#2 确保助手仅在文件上传过程中引用相关知识。

#3 添加一些额外的样式选择规则。

#4 通过赋予助手细微差别和独特风格,使其更加实用。

完成上述步骤后,您可以从章节源上传文件gutenberg_robot_books。图 3.12 演示了如何一次上传多个文件。一次可上传的最大文件数量取决于文件的大小。

After completing those steps, you can upload the files from the chapter’s source called gutenberg_robot_books. Figure 3.12 demonstrates uploading multiple files at a time. The maximum number of files you can upload at a time will vary according to the sizes of the files.

数字
图 3.12将文档上传至助理的知识库

上传文档、设置指令并为助手命名和添加头像后,即可开始使用。搜索是知识助手最基本的应用,其他以提示形式呈现的使用案例如表 3.1 所示。

You can start using it after uploading the documents, setting the instructions, and giving the assistant a name and an image. Search is the most basic application of a knowledge assistant, and other use cases in the form of prompts are shown in table 3.1.

表 3.1知识助手的使用案例
用例
示例提示
结果
搜索

在你的知识库中查找这个短语:“机器人仆人”。

返回文档及其摘录。

比较

找出三本写作风格最相似、内容相同的书籍。

返回三个最相似的文档

对比

找出三本最不同的书。

返回馆藏中最不同的书籍

订购

我应该按什么顺序阅读这些书?

返回书籍的有序序列。

分类

这些书中哪一本是最现代的?

文件分类

一代

虚构一段文字,模仿你对机器人仆人的了解。

根据其知识库生成新内容

这些用例只是人工智能知识助手众多应用场景中的一部分。虽然这项功能可能不会颠覆企业搜索,但它能让小型组织和个人更便捷地访问自己的文档。它允许创建助手,将其作为一种可公开的知识形式。下一节,我们将探讨如何让所有人都能使用这些助手。

These use cases are just a sample of the many things possible with an AI knowledge assistant. While this feature may not be poised to disrupt enterprise search, it gives smaller organizations and individuals more access to their documents. It allows the creation of assistants as a form of knowledge that can be exposed publicly. In the next section, we’ll look at how to make assistants consumable by all.

3.5发布您的 GPT

3.5 Publishing your GPT

如果您对 GPT 助手感到满意,即可使用或通过链接与他人分享。目前,通过 ChatGPT 使用 GPT 助手需要 Plus 会员资格。要将您的 GPT 助手发布供他人使用,请点击“分享”按钮,然后选择分享选项,如图 3.13 所示。

Once you’re happy with your GPT, you can use it or share it with others by providing a link. Consuming GPT assistants through ChatGPT currently requires a Plus subscription. To publish your GPT for others, click the Share button, and select your sharing option, as shown in figure 3.13.

数字
图 3.13 GPT 共享选项

无论您是与朋友和同事分享您的 GPT,还是将其公开发布到 GPT 商店,助手的使用量都计入使用该助手的帐户,而不是发布者的帐户。这意味着,例如,如果您有一个特别昂贵的 GPT,会生成大量图像,那么其他人使用它时,不会影响您的帐户。

Whether you share your GPT with friends and colleagues or publicly in the GPT Store, the assistant’s usage is taken from the account using it, not the publisher. This means if you have a particularly expensive GPT that generates a lot of images, for example, it won’t affect your account while others use it.

3.5.1昂贵的 GPT 助手

3.5.1 Expensive GPT assistants

截至撰稿时,OpenAI 会追踪您的 ChatGPT 帐户的资源使用情况,包括 GPT 模型所使用的资源。如果您达到资源使用上限并被封禁,您的 ChatGPT 帐户也会被封禁。封禁通常只会持续几个小时,但这无疑会令人非常恼火。

At the time of writing, OpenAI tracks the resource usage of your ChatGPT account, including that used for GPTs. If you hit a resource usage limit and get blocked, your ChatGPT account will also be blocked. Blockages typically only last a couple of hours, but this can undoubtedly be more than a little annoying.

因此,我们希望确保使用您的 GPT 的用户不会超出其正常使用情况下的资源使用限制。以下列出了在使用 GPT 时会增加资源使用量的功能:

Therefore, we want to ensure that users using your GPT don’t exceed their resource usage limits for regular use. Following is a list of features that increase resource usage while using the GPT:

  • 创建图片 ——图片生成仍然是一项付费服务​​,频繁生成图片可能会导致用户账号被封禁。通常建议您告知用户潜在风险,并/或尽量减少图片生成的频率。
  • Creating images —Image generation is still a premium service, and successive image generation can quickly get your user blocked. It’s generally recommended that you inform your users of the potential risks and/or try to reduce how frequently images are generated.
  • 代码解析 ——此功能允许用户上传文件并运行代码以进行数据分析。如果您认为用户需要频繁使用此编码工具,请务必告知他们其中的风险。
  • Code interpretation —This feature allows for file uploads and running of code for data analysis. If you think your users will require constant use of the coding tool, then inform them of the risk.
  • 视觉,描述图像 ——如果你正在构建一个使用视觉来描述图像并从中提取信息的助手,请计划谨慎使用它。
  • Vision, describing images —If you’re building an assistant that uses vision to describe and extract information from the image, plan to use it sparingly.
  • 文件上传 ——如果您的 GPT 使用大量文件或允许您上传多个文件,则可能会导致下载受阻。请始终引导用户避免任何妨碍他们使用 GPT 的操作。
  • File uploads —If your GPT uses a lot of files or allows you to upload several files, this may cause blocks. As always, guide the user away from anything preventing them from enjoying your GPT.

让你的助手感知资源使用情况其实很简单,只需将清单 3.18 中所示的规则添加到助手的指令中即可。指令可以只是一条向用户传达警告并告知助手的语句。你甚至可以要求助手限制某些功能的使用。

Making your assistant aware of resource usage can be as simple as adding the rule shown in listing 3.18 to the assistant’s instructions. The instructions can be just a statement relaying the warning to the user and making the assistant aware. You could even ask the assistant to limit its usage of certain features.

清单 3.18资源使用规则示例
规则:
生成图像时,请确保用户了解创建多个图像的可能性。
图片可能会迅速导致他们的账号被暂时封禁。
RULE:
When generating images, ensure the user is aware that creating multiple 
images quickly could temporarily block their account.

引导你的助手更加注重资源利用,最终会提升助手的易用性。这也有助于避免用户因不知情而被助手屏蔽而感到不满。如果你计划发布 GPT 版本,这一点可能尤为重要,但在此之前,让我们在下一节探讨一下经济效益。

Guiding your assistant to be more resource conscious in the end makes your assistant more usable. It also helps prevent angry users who unknowingly get blocked using your assistant. This may be important if you plan on releasing your GPT, but before that, let’s investigate the economics in the next section.

3.5.2理解GPT的经济性

3.5.2 Understanding the economics of GPTs

随着 GPT Assistant 和 GPT Store 的发布,OpenAI 宣布未来可能会推出一项针对 GPT 发布者的利润分成计划。虽然我们仍在等待更多关于该计划的信息,但许多人已经开始猜测其具体形式。

Upon the release of GPT Assistants and the GPT Store, OpenAI announced the potential for a future profit-sharing program for those who published GPTs. While we’re still waiting to hear more about this program, many have speculated what this may look like.

有人认为,GPT 应用商店可能只会将 10% 到 20% 的利润返还给开发者。这远低于其他应用平台的返利比例,但对技术知识和资源的要求也低得多。GPT 应用商店里充斥着大量几乎免费的助手,前提是您拥有 Plus 会员资格,但这种情况未来可能会改变。无论如何,您可能出于以下几个原因想要创建公开的 GPT 应用:

Some have suggested the store may return only 10% to 20% of profits to the builders. This is far less than the percentage on other app platforms but requires much less technical knowledge and fewer resources. The GPT Store is flooded with essentially free assistants, provided you have a Plus subscription, but that may change in the future. Regardless, there are also several reasons why you may want to build public GPTs:

  • 个人作品集 ——或许您想展示您在快速工程方面的知识,或者您构建下一代人工智能应用的能力。在 GPT 应用商店中拥有一些 GPT 作品可以帮助您展示您的知识和创建实用人工智能应用的能力。
  • Personal portfolio —Perhaps you want to demonstrate your knowledge of prompt engineering or your ability to build the next wave of AI applications. Having a few GPTs in the GPT Store can help demonstrate your knowledge and ability to create useful AI applications.
  • 知识和经验 ——如果您对某个学科或领域拥有深入的了解,这可以成为您作为助理的一大优势。这类助理的受欢迎程度会因您的专业领域而异。
  • Knowledge and experience —If you have in-depth knowledge of a subject or topic, this can be a great way to package that as an assistant. These types of assistants will vary in popularity based on your area of expertise.
  • 交叉营销和商业合作 ——这种做法在商店中越来越普遍,使公司能够利用助手引导客户。随着公司更多地整合人工智能,这种做法无疑会更加普遍。
  • Cross-marketing and commercial tie-in —This is becoming more common in the Store and provides companies the ability to lead customers using an assistant. As companies integrate more AI, this will certainly be more common.
  • 为您的产品/服务提供得力助手 ——并非所有公司或组织都能承担得起托管聊天机器人的成本。虽然目前只有 ChatGPT 的订阅用户才能使用助手,但未来它们可能会更加普及。这可能意味着所有事物都将拥有 GPT 助手,就像互联网早期,每家公司都争相建立自己的网站一样。
  • Helpful assistant to your product/service —Not all companies or organizations can sustain the cost of hosting chatbots. While consuming assistants is currently limited to ChatGPT subscribers, they will likely be more accessible in the future. This may mean having GPTs for everything, perhaps like the internet’s early days where every company rushed to build a web presence.

虽然目前的 GPT 商店仅面向 ChatGPT 订阅用户,但如果 OpenAI 目前的趋势持续下去,我们很可能会看到一个完全公开的 GPT 商店。公开的 GPT 有可能颠覆我们搜索、了解产品和服务以及使用互联网的方式。在本章最后一节中,我们将探讨如何发布 GPT 以及一些重要的注意事项。

While the current form of the GPT Store is for ChatGPT subscribers, if the current trend with OpenAI continues, we’ll likely see a fully public GPT Store. Public GPTs have the potential to disrupt the way we search, investigate products and services, and consume the internet. In the last section of this chapter, we’ll examine how to publish a GPT and some important considerations.

3.5.3发布 GPT

3.5.3 Releasing the GPT

好的,您对自己的 GPT 及其运行情况感到满意,并且看到了将其分享给他人带来的实际好处。如图 3.14 所示,将 GPT 发布供公众(订阅者)使用非常简单。选择“GPT 商店”选项并点击“保存”后,您就可以设置类别并提供指向您的链接了。

Okay, you’re happy with your GPT and how it operates, and you see real benefit from giving it to others. Publishing GPTs for public (subscribers) consumption is easy, as shown in figure 3.14. After selecting the GPT Store as the option and clicking Save, you’ll now have the option to set the category and provide links back to you.

数字
图 3.14点击“保存”后选择发布到 GPT 商店的选项

这很简单,所以这里还有一些在发布 GPT 之前需要考虑的事项:

That is easy, so here are a few more things you’ll want to consider before publishing your GPT:

  • GPT 描述 ——创建一段好的描述,您甚至可以请 ChatGPT 帮助您撰写一段能够提升 GPT 搜索引擎优化 (SEO) 效果的描述。GPT 现在会出现在 Google 搜索结果中,因此良好的搜索引擎优化有助于提高您的助手的曝光率。一段好的描述还能帮助用户决定是否愿意花时间使用您的助手。
  • GPT description —Create a good description, and you may even want to ask ChatGPT to help you build a description that increases the search engine optimization (SEO) of your GPT. GPTs are now showing up in Google searches, so good search engine optimization can help increase exposure to your assistant. A good description will also help users decide if they want to take the time to use your assistant.
  • 标志 ——一个简洁美观、能够清晰展现助手功能的标志无疑会大有帮助。为 Google Play 助手设计标志基本上是免费的,但花些时间反复修改几个设计稿,确实有助于吸引用户使用你的助手。
  • The logo —A nice, clean logo that identifies what your assistant does can undoubtedly help. Logo design for GPTs is effectively a free service, but taking the time to iterate over a few images can help draw users to your assistant.
  • 类别 ——默认情况下,类别已被选中,但请确保它符合您的助手需求。如果您觉得不合适,请更改类别,您甚至可以选择“其他”并自定义类别。
  • The category —By default, the category will already be selected, but make sure it fits your assistant. If you feel it doesn’t, than change the category, and you may even want to select Other and define your own.
  • 链接 ——务必添加社交媒体链接,甚至可以添加用于跟踪 GPT 问题的 GitHub 代码库链接。在 GPT 中添加链接可以向用户表明,如果他们遇到问题或有任何疑问,可以联系构建者。
  • Links —Be sure to set reference links for your social media and perhaps even a GitHub repository that you use to track problems for the GPT. Adding links to your GPT demonstrates to users that they can reach out to the builder if they encounter problems or have questions.

随着 GPT 应用商店的日趋成熟,未来可能会出现更多需求。商业模式尚待建立,其他经验教训也可能会随之而来。无论您是为自己还是为他人构建 GPT,这样做都有助于您更好地理解如何构建代理和助手。正如本书后续章节将要阐述的,GPT 助手是您知识体系中一个重要的基础。

Further requirements may likely emerge as the GPT Store matures. The business model remains to be established, and other learnings will likely follow. Whether you decide to build GPTs for yourself or others, doing so can help improve your understanding of how to build agents and assistants. As we’ll see throughout the rest of this book, GPT assistants are a useful foundation for your knowledge.

3.6练习

3.6 Exercises

完成以下练习以加深你对相关内容的理解:

Complete the following exercises to improve your knowledge of the material:

  • 练习 1  — 构建你的第一个 GPT 助手
  • Exercise 1 —Build Your First GPT Assistant

目标 ——使用 ChatGPT 界面创建一个简单的 GPT 助手。

Objective —Create a simple GPT assistant using the ChatGPT interface.

任务:

Tasks:

    • 如果您还没有ChatGPT Plus会员资格,请注册一个。
    • 导航至 GPT Assistants 平台,然后单击“创建”按钮。
    • 按照 Builder 聊天界面的指示,创建一个烹饪助手,该助手可以根据现有食材提供膳食建议。
    • 手动配置助手,添加自定义的食谱生成规则,例如包含营养信息和成本估算。
    • Sign up for a ChatGPT Plus subscription if you don’t already have one.
    • Navigate to the GPT Assistants platform, and click the Create button.
    • Follow the Builder chat interface to create a Culinary Companion assistant that provides meal suggestions based on available ingredients.
    • Manually configure the assistant to add custom rules for recipe generation, such as including nutritional information and cost estimates.
  • 练习 2  — 数据分析助手
  • Exercise 2 —Data Analysis Assistant

目标 ——开发一个能够分析 CSV 文件并提供分析结果的 GPT 助手。

Objective —Develop a GPT assistant that can analyze CSV files and provide insights.

任务:

Tasks:

    • 设计一个数据科学助手,它可以加载和分析 CSV 文件,类似于本章中的 Data Scout 示例。
    • 启用代码解释工具,并上传示例 CSV 文件(例如,来自 Kaggle 的数据集)。
    • 使用助手执行数据清理、可视化和假设检验等任务。
    • 记录你的过程和发现,并注明遇到的任何挑战或需要改进的地方。
    • Design a data science assistant that can load and analyze CSV files, similar to the Data Scout example in the chapter.
    • Enable the Code Interpretation tool, and upload a sample CSV file (e.g., a dataset from Kaggle).
    • Use the assistant to perform tasks such as data cleaning, visualization, and hypothesis testing.
    • Document your process and findings, noting any challenges or improvements needed.
  • 练习 3  — 创建自定义操作
  • Exercise 3 —Create a Custom Action

目标 ——使用 FastAPI 服务为 GPT 助手添加自定义操作。

Objective —Extend a GPT assistant with a custom action using a FastAPI service.

任务:

Tasks:

    • 按照以下步骤创建一个提供特定功能的 FastAPI 服务,例如获取每日任务列表。
    • 为该服务生成 OpenAPI 规范,并使用 ngrok 在本地部署它。
    • 配置一个新的助手来使用此自定义操作,确保它能正确连接到 FastAPI 端点。
    • 通过让助手执行操作并验证输出来测试该助手。
    • Follow the steps to create a FastAPI service that provides a specific function, such as fetching a list of daily tasks.
    • Generate the OpenAPI specification for the service, and deploy it locally using ngrok.
    • Configure a new assistant to use this custom action, ensuring it connects correctly to the FastAPI endpoint.
    • Test the assistant by asking it to perform the action and verify the output.
  • 练习 4  — 文件上传知识助手
  • Exercise 4 —File Upload Knowledge Assistant

目标 ——根据上传的文档构建一个具有专业知识的助手。

Objective —Build an assistant with specialized knowledge from uploaded documents.

任务:

Tasks:

    • 选择一本免费电子书或与特定主题相关的文档集(例如,经典文学作品、技术手册)。
    • 将这些文件上传到新的 GPT 助手,并将该助手配置为对上传的内容进行专家分析。
    • 创建一系列提示,以测试助手引用和总结文档信息的能力。
    • 评估助手的表现,并进行必要的调整以提高其准确性和帮助性。
    • Select a freely available e-book or a collection of documents related to a specific topic (e.g., classic literature, technical manuals).
    • Upload these files to a new GPT assistant, and configure the assistant to act as an expert on the uploaded content.
    • Create a series of prompts to test the assistant’s ability to reference and summarize the information from the documents.
    • Evaluate the assistant’s performance, and make any necessary adjustments to improve its accuracy and helpfulness.
  • 练习 5  — 发布和分享你的助手
  • Exercise 5 —Publish and Share Your Assistant

目标 ——将您的 GPT 助手发布到 GPT 商店并与他人分享。

Objective —Publish your GPT assistant to the GPT Store and share it with others.

任务:

Tasks:

    • 完成助手的配置和测试,确保其按预期运行。
    • 撰写一份引人入胜的描述,并为你的助手设计一个合适的标志。
    • 选择正确的类别,并设置与您的社交媒体或 GitHub 存储库的任何必要链接。
    • 将助手发布到 GPT 商店,并将链接分享给朋友或同事。
    • 收集用户反馈,并根据他们的意见改进助手,以提高其可用性和功能。
    • Finalize the configuration and testing of your assistant to ensure it works as intended.
    • Write a compelling description, and create an appropriate logo for your assistant.
    • Choose the correct category, and set up any necessary links to your social media or GitHub repository.
    • Publish the assistant to the GPT Store, and share the link with friends or colleagues.
    • Gather feedback from users, and refine the assistant based on their input to improve its usability and functionality.

概括

Summary

  • OpenAI GPT Assistants 平台支持通过 ChatGPT UI 构建和部署 AI 代理,专注于创建引人入胜且功能强大的助手。
  • The OpenAI GPT Assistants platform enables building and deploying AI agents through the ChatGPT UI, focusing on creating engaging and functional assistants.
  • 你可以利用 GPT 的代码解释功能对用户上传的 CSV 文件进行数据分析,使助手能够像数据科学家一样工作。
  • You can use GPT’s code interpretation capabilities to perform data analysis on user-uploaded CSV files, enabling assistants to function as data scientists.
  • 助手可以通过自定义操作进行扩展,从而允许通过 API 端点与外部服务集成。这包括生成 FastAPI 服务及其对应的 OpenAPI 规范。
  • Assistants can be extended with custom actions, allowing integration with external services via API endpoints. This includes generating FastAPI services and their corresponding OpenAPI specifications.
  • 通过文件上传,可以丰富助理的专业知识,使他们能够成为特定文本或文档的权威来源。
  • Assistants can be enriched with specialized knowledge through file uploads, allowing them to act as authoritative sources on specific texts or documents.
  • 将 GPT 商用化包括将其发布到 GPT 商店,在那里您可以与更广泛的受众分享和推广您的助手。
  • Commercializing your GPT involves publishing it to the GPT Store, where you can share and market your assistant to a broader audience.
  • 构建一个功能齐全的助手需要反复推敲设计提示,定义清晰的用户画像,设定规则,并确保助手的输出符合用户的期望。
  • Building a functional assistant involves iterating through design prompts, defining a clear persona, setting rules, and ensuring the assistant’s output aligns with user expectations.
  • 创建自定义操作需要理解和实现 OpenAPI 规范,使用 ngrok 等工具在本地部署服务,并将这些服务连接到您的助手。
  • Creating custom actions requires understanding and implementing OpenAPI specifications, deploying services locally using tools such as ngrok, and connecting these services to your assistant.
  • 知识助手可以处理各种任务,从搜索和比较文档到根据其知识库生成新内容。
  • Knowledge assistants can handle various tasks, from searching and comparing documents to generating new content based on their knowledge base.
  • 出版辅助工具需要仔细考虑资源利用、用户体验和经济因素,以确保其有效性和可持续性,从而满足公众使用需求。
  • Publishing assistants require careful consideration of resource usage, user experience, and economic factors to ensure their effectiveness and sustainability for public use.
  • ChatGPT Plus 订阅用户可访问 GPT Store,这是一个学习和提高构建 AI 助手技能的宝贵平台,未来还有可能获得利润分成机会。
  • The GPT Store, available to ChatGPT Plus subscribers, is a valuable platform for learning and gaining proficiency in building AI assistants, with the potential for future profit-sharing opportunities.

4. 探索多智能体系统

4 Exploring multi-agent systems

本章内容

This chapter covers

  • 使用 AutoGen Studio 构建多智能体系统
  • Building multi-agent systems using AutoGen Studio
  • 构建一个简单的多智能体系统
  • Building a simple multi-agent system
  • 创建能够在群聊中协作的代理
  • Creating agents that can work collaboratively over a group chat
  • 使用 CrewAI 构建代理团队和多代理系统
  • Building an agent crew and multi-agent systems using CrewAI
  • 使用 CrewAI 扩展代理数量并探索处理模式
  • Extending the number of agents and exploring processing patterns with CrewAI

现在,让我们从 AutoGen 到 CrewAI 这两个成熟的多智能体平台开始探索。首先是 AutoGen,这是一个微软项目,它支持多个智能体,并提供一个用于操作这些智能体的工作室。我们将深入了解 AutoGen,它不仅支持多个智能体,还提供了一个工作室,方便用户轻松上手。之后,我们将进行更多 AutoGen 智能体的实际编码工作,利用对话和群聊协作来解决任务。

Now let’s take a journey from AutoGen to CrewAI, two well-established multi-agent platforms. We’ll start with AutoGen, a Microsoft project that supports multiple agents and provides a studio for working with them. We’ll explore a project from Microsoft called AutoGen, which supports multiple agents but also provides a studio to ease you into working with agents. From there, we’ll get more hands-on coding of AutoGen agents to solve tasks using conversations and group chat collaborations.

接下来,我们将介绍CrewAI,这是一个我们自主提出的企业级智能体系统,它采用了一种不同的方法。CrewAI兼顾了基于角色和自主的智能体,可以灵活地构建顺序或层级式任务管理系统。我们将探讨CrewAI如何解决各种复杂问题。

Then, we’ll transition to CrewAI, a self-proposed enterprise agentic system that takes a different approach. CrewAI balances role-based and autonomous agents that can be sequentially or hierarchically flexible task management systems. We’ll explore how CrewAI can solve diverse and complex problems.

多智能体系统融合了单智能体系统的许多工具,但其优势在于能够向其他智能体提供外部反馈和评估。这种内部支持和批评智能体解决方案的能力赋予了多智能体系统更强大的功能。下一节我们将以 AutoGen Studio 为例,介绍多智能体系统。

Multi-agent systems incorporate many of the same tools single-agent systems use but benefit from the ability to provide outside feedback and evaluation to other agents. This ability to support and criticize agent solutions internally gives multi-agent systems more power. We’ll explore an introduction to multi-agent systems, beginning with AutoGen Studio in the next section.

4.1使用 AutoGen Studio 引入多智能体系统

4.1 Introducing multi-agent systems with AutoGen Studio

AutoGen Studio 是一款功能强大的工具,它在后台使用多个智能体来解决用户指定的任务和问题。本书中一些较为复杂的代码就是使用这款工具编写的。正因如此,它以及其他一些优点,使其成为学习实用多智能体系统的绝佳入门工具。

AutoGen Studio is a powerful tool that employs multiple agents behind the scenes to solve tasks and problems a user directs. This tool has been used to develop some of the more complex code in this book. For that reason and others, it’s an excellent introduction to a practical multi-agent system.

图 4.1 展示了 AutoGen 采用的代理连接/通信模式示意图。AutoGen 是一个对话式多代理平台,因为它使用自然语言进行通信。自然语言对话似乎是代理之间最自然的通信模式,但这并非唯一的通信方式,稍后您将看到。

Figure 4.1 shows a schematic diagram of the agent connection/communication patterns AutoGen employs. AutoGen is a conversational multi-agent platform because communication is done using natural language. Natural language conversation seems to be the most natural pattern for agents to communicate, but it’s not the only method, as you’ll see later.

数字
图 4.1 AutoGen 代理如何通过对话进行沟通(来源:AutoGen)

AutoGen 支持多种对话模式,从群组和层级式对话到更常见、更简单的代理通信。在代理通信中,一个代理充当代理,将通信定向到相关的代理以完成任务。代理类似于服务员,它接收订单并将其传递给厨房,厨房烹饪食物,然后服务员将做好的食物端上桌。

AutoGen supports various conversational patterns, from group and hierarchical to the more common and simpler proxy communication. In proxy communication, one agent acts as a proxy and directs communication to relevant agents to complete tasks. A proxy is similar to a waiter taking orders and delivering them to the kitchen, which cooks the food. Then, the waiter serves the cooked food.

AutoGen 的基本模式使用一个UserProxy用户代理和一个或多个辅助代理。图 4.2 展示了用户代理接收来自用户的指令,然后指示一个能够编写代码的辅助代理执行任务。每次辅助代理完成任务后,代理都会审查、评估并向辅助代理提供反馈。这个迭代循环会一直持续,直到代理对结果满意为止。

The basic pattern in AutoGen uses a UserProxy and one or more assistant agents. Figure 4.2 shows the user proxy taking direction from a human and then directing an assistant agent enabled to write code to perform the tasks. Each time the assistant completes a task, the proxy agent reviews, evaluates, and provides feedback to the assistant. This iteration loop continues until the proxy is satisfied with the results.

数字
图 4.2用户代理和助手代理之间的通信(来源:AutoGen)

代理模型的优势在于它可以替代必要的人工反馈和评估,而且在大多数情况下,它的表现都很出色。虽然它并不能完全消除人工反馈和评估的必要性,但它总体上能产生更完整的结果。此外,虽然迭代过程比较耗时,但这段时间你可以用来喝杯咖啡或处理其他任务。

The benefit of the proxy is that it works to replace the required human feedback and evaluation, and, in most cases, it does a good job. While it doesn’t eliminate the need for human feedback and evaluation, it produces much more complete results overall. And, while the iteration loop is time consuming, it’s time you could be drinking a coffee or working on other tasks.

AutoGen Studio 是 AutoGen 团队开发的一款工具,它能帮助您轻松入门对话式智能体。在接下来的练习中,我们将安装 Studio 并运行一些实验,以了解该平台的性能。这些工具目前仍处于快速开发阶段,因此如果您遇到任何问题,请参阅 AutoGen GitHub 代码库中的文档。

AutoGen Studio is a tool developed by the AutoGen team that provides a helpful introduction to conversable agents. In the next exercise, we’ll install Studio and run some experiments to see how well the platform performs. These tools are still in a rapid development cycle, so if you encounter any problems, consult the documentation on the AutoGen GitHub repository.

4.1.1安装和使用 AutoGen Studio

4.1.1 Installing and using AutoGen Studio

在 Visual Studio Code (VS Code) 中打开chapter_04文件夹,创建本地 Python 虚拟环境,然后安装该requirements.txt文件。如果您需要帮助,请参阅附录 B,了解本章练习所需的所有安装步骤。

Open the chapter_04 folder in Visual Studio Code (VS Code), create a local Python virtual environment, and install the requirements.txt file. If you need assistance with this, consult appendix B to install all of this chapter’s exercise requirements.

在 VS Code 中打开终端(Ctrl-` 或 Cmd-`),并指向你的虚拟环境,然后使用清单 4.1 中所示的命令运行 AutoGen Studio。首先,你需要为你的 OpenAI 密钥定义一个环境变量。由于 8080 和 8081 端口比较常用,而且如果你还有其他服务正在运行,请将端口更改为 8082 或你选择的其他端口。

Open a terminal in VS Code (Ctrl-`, Cmd-`) pointing to your virtual environment, and run AutoGen Studio using the command shown in listing 4.1. You’ll first need to define an environment variable for your OpenAI key. Because ports 8080 and 8081 are popular, and if you have other services running, change the port to 8082 or something you choose.

清单 4.1启动 AutoGen Studio
# 在 Bash(Git Bash)中设置环境变量
export OPENAI_API_KEY=”<您的 API 密钥>”         #1

# 使用 PowerShell 发送环境变量
$env:VAR_NAME =”<您的 API 密钥>” #1

autogenstudio ui --port 8081    #2
# set environment variable on Bash (Git Bash)
export OPENAI_API_KEY=”<your API key>”          #1

# sent environment variable with PowerShell
$env:VAR_NAME =”<your API key>"                #1

autogenstudio ui --port 8081     #2
1. 使用与您的终端类型相匹配的命令。2

. 如果您的机器上出现端口冲突,请更改端口。

在浏览器中打开如图 4.3 所示的 AutoGen Studio 界面(截至撰写本文时)。虽然可能存在一些差异,但有一点可以肯定:主要界面仍然是聊天窗口。输入一个需要编写代码的复杂任务。此处使用的示例是:Create a plot showing the popularity of the term GPT Agents in Google search.

Navigate your browser to the AutoGen Studio interface shown in figure 4.3 (as of this writing). While there may be differences, one thing is for sure: the primary interface will still be chat. Enter a complex task that requires coding. The example used here is Create a plot showing the popularity of the term GPT Agents in Google search.

数字
图 4.3在 AutoGen 界面中输入代理要执行的任务

在示例中,代理助手会生成代码片段来执行或完成各种子任务,而其他代理则会协同完成任务。然后,用户代理会尝试执行这些代码片段并评估输出结果。在许多情况下,只要证明代码能够运行并产生所需的输出,用户代理就会批准任务完成。

The agent assistant generates code snippets to perform or complete various subtasks as the agents work together through the task in the example. The user proxy agent then attempts to execute those code snippets and assesses the output. In many cases, proving the code runs and produces the required output is sufficient for the user proxy agent to approve the task’s completion.

如果助理代理请求出现任何问题,请让代理尝试其他方法或尝试其他问题。这凸显了代理系统使用已过期且不再有效的软件包或库时存在的一个更大问题。因此,通常最好让代理执行操作,而不是编写代码来执行这些操作。

If you encounter any problems with the assistant agent requests, ask the proxy agent to try a different method or another problem. This highlights a bigger problem with agentic systems using packages or libraries that have expired and no longer work. For this reason, it’s generally better to get agents to execute actions rather than build code to perform actions as tools.

图 4.4 显示了代理完成任务的过程。代理会收集所有生成的代码片段、图像或其他文档,并将其附加到指定位置。将它们添加到消息中。您还可以通过打开“代理消息”展开器来查看代理对话。在许多情况下,如果您要求代理生成图表或应用程序,则会打开辅助窗口显示这些结果。

Figure 4.4 shows the agent’s completion of the task. The proxy agent will collect any generated code snippet, images, or other documents and append them to the message. You can also review the agent conversation by opening the Agent Messages expander. In many cases, if you ask the agent to generate plots or applications, secondary windows will open showing those results.

数字
图 4.4代理完成任务后的输出

令人惊讶的是,这些智能体能够很好地完成大多数任务。根据任务的复杂程度,您可能需要对代理进行进一步的迭代。有时,智能体可能由于缺乏必要的技能而只能完成任务的一小部分。下一节,我们将探讨如何为智能体添加技能。

Amazingly, the agents will perform most tasks nicely and complete them well. Depending on the complexity of the task, you may need to further iterate with the proxy. Sometimes, an agent may only go so far to complete a task because it lacks the required skills. In the next section, we’ll look at how to add skills to agents.

4.1.2在 AutoGen Studio 中添加技能

4.1.2 Adding skills in AutoGen Studio

技能和工具(或者如本书中所称的“动作”)是智能体扩展自身功能的主要途径。动作使智能体能够执行代码、调用 API,甚至进一步评估和检查生成的输出。AutoGen Studio 目前仅提供一套用于获取网页内容或生成图像的基本工具。

Skills and tools, or actions, as we refer to them in this book, are the primary means by which agents can extend themselves. Actions give agents the ability to execute code, call APIs, or even further evaluate and inspect generated output. AutoGen Studio currently begins with just a basic set of tools to fetch web content or generate images.

在以下练习场景中,我们将添加一项技能/动作,即使用 OpenAI 视觉模型来检查图像。这样,当我们要求助手生成包含特定内容的图像时,代理代理就能提供反馈。

In the following exercise scenario, we’ll add a skill/action to inspect an image using the OpenAI vision model. This will allow the proxy agent to provide feedback if we ask the assistant to generate an image with particular content.

运行 AutoGen Studio 后,转到“构建”选项卡,然后单击“技能”(如图 4.5 所示)。接下来,单击“新建技能”按钮,打开代码面板,您可以在其中复制粘贴代码。在此选项卡中,您还可以配置模型、代理和代理工作流程。

With AutoGen Studio running, go to the Build tab and click Skills, as shown in figure 4.5. Then, click the New Skill button to open a code panel where you can copy–paste code to. From this tab, you can also configure models, agents, and agent workflows.

数字
图 4.5在“构建”选项卡上创建新技能的步骤

输入清单 4.2 中所示的代码,该代码也已在本书源代码中提供describe_image.py。将此代码复制并粘贴到编辑器窗口中,然后单击底部的“保存”按钮。

Enter the code shown in listing 4.2 and also provided in the book’s source code as describe_image.py. Copy and paste this code into the editor window, and then click the Save button at the bottom.

清单 4.2 describe_image.py
导入 base64
导入请求
导入操作系统

def describe_image(image_path='animals.png') -> str:
    """
    使用 GPT-4 Vision 来检查和描述图像内容。

    :param input_path: str,要描述的 PNG 文件的名称。
    """
    api_key = os.environ['OPEN_API_KEY']

    # 用于对图像进行编码的函数
    def encode_image(image_path):     #1
        with open(image_path, "rb") as image_file:
            返回 base64.b64encode(image_file.read()).decode('utf-8')
    获取 base64 字符串
    base64_image = encode_image(image_path)

    标题 = {
    "Content-Type": "application/json",
    "授权": f"Bearer {api_key}"
    }

    有效载荷 = {
    “型号”: “gpt-4-turbo”
    “消息”:[
        {
        "角色": "用户",
        “内容”: [
            {
            "type": "text",
            文本:这张图片里有什么?
            },
            {
            "type": "image_url",
            "image_url": {
         "url": f"data:image/jpeg;base64,{base64_image}"     #2
            }
            }
        ]
        }
    ],
    “max_tokens”:300
    }

    响应 = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers=headers,
        json=有效载荷)

    返回 response.json()["choices"][0]["message"]#3
["内容"] #3
import base64
import requests
import os

def describe_image(image_path='animals.png') -> str:
    """
    Uses GPT-4 Vision to inspect and describe the contents of the image.

    :param input_path: str, the name of the PNG file to describe.
    """
    api_key = os.environ['OPEN_API_KEY']

    # Function to encode the image
    def encode_image(image_path):      #1
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    # Getting the base64 string
    base64_image = encode_image(image_path)

    headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
    }

    payload = {
    "model": "gpt-4-turbo",
    "messages": [
        {
        "role": "user",
        "content": [
            {
            "type": "text",
            "text": "What’s in this image?"
            },
            {
            "type": "image_url",
            "image_url": {
         "url": f"data:image/jpeg;base64,{base64_image}"      #2
            }
            }
        ]
        }
    ],
    "max_tokens": 300
    }

    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers=headers,
        json=payload)

    return response.json()["choices"][0]["message"]  #3
["content"]                                          #3
#1 加载图像并将其编码为 Base64 字符串的函数

#2 将图像字符串与 JSON 有效负载一起包含

在内 #3 解包响应并返回响应内容

describe_image功能使用 OpenAI GPT-4 视觉模型来描述图像内容。这项技能可以与现有的 generate_image 技能结合使用,作为质量评估手段。智能体可以确认生成的图像是否符合用户的要求。

The describe_image function uses the OpenAI GPT-4 vision model to describe what is in the image. This skill can be paired with the existing generate_image skill as a quality assessment. The agents can confirm that the generated image matches the user’s requirements.

添加技能后,必须将其添加到特定的代理工作流程和代理中才能使用。图 4.6 展示了如何在通用或默认代理工作流程中将新技能添加到主要助理代理。

After the skill is added, it must be added to the specific agent workflow and agent for use. Figure 4.6 demonstrates adding the new skill to the primary assistant agent in the general or default agent workflow.

数字
图 4.6使用新技能配置 primary_assistant 代理

现在这项技能已添加到主助手,我们可以让助手创建特定图像,并使用新的 describe_image 技能对其进行验证。由于图像生成器在处理正确文本方面一直存在问题,我们将创建一个练习任务来专门解决这个问题。

Now that the skill is added to the primary assistant, we can task the agent with creating a specific image and validating it using the new describe_image skill. Because image generators notoriously struggle with correct text, we’ll create an exercise task to do just that.

输入清单 4.3 中所示的文本,提示客服人员为本书创建封面图片。我们会明确指出文本必须正确,并要求客服人员使用新describe_image功能验证图片。

Enter the text shown in listing 4.3 to prompt the agents to create a book image cover for this book. We’ll explicitly say that the text needs to be correct and insist that the agent uses the new describe_image function to verify the image.

清单 4.3提示书籍封面设计
请为《GPT Agents In Action》这本书设计封面,使用……
describe_image 技能确保书名拼写正确
封面正确无误。
Please create a cover for the book GPT Agents In Action, use the 
describe_image skill to make sure the title of the book is spelled 
correctly on the cover

输入提示后,请稍等片刻,您可能会看到一些关于图像生成和验证过程的对话。最终,如果一切正常,代理将返回如图 4.7 所示的结果。

After the prompt is entered, wait for a while, and you may get to see some dialogue exchanged about the image generation and verification process. In the end, though, if everything works correctly, the agents will return with the results shown in figure 4.7.

数字
图 4.7代理在图像生成任务中生成的输出文件

令人惊叹的是,代理协调仅用了几次迭代就完成了任务。除了图片之外,您还可以看到为协助完成任务而生成的各种辅助代码片段。AutoGen Studio 令人印象深刻之处在于它能够集成代理可以进一步调整以完成特定目标的技能。下一节将展示这些强大的代理是如何在代码中实现的。

Remarkably, the agent coordination completed the task in just a couple of iterations. Along with the images, you can also see the various helper code snippets generated to assist with task completion. AutoGen Studio is impressive in its ability to integrate skills that the agents can further adapt to complete some goal. The following section will show how these powerful agents are implemented in code.

4.2探索自动生成

4.2 Exploring AutoGen

虽然 AutoGen Studio 是理解多智能体系统的绝佳工具,但我们必须深入研究代码。幸运的是,使用 AutoGen 编写多智能体示例非常简单,而且运行起来也很方便。我们将在下一节介绍 AutoGen 的基本设置。

While AutoGen Studio is a fantastic tool for understanding multi-agent systems, we must look into the code. Fortunately, coding multiple agent examples with AutoGen is simple and easy to run. We’ll cover the basic AutoGen setup in the next section.

4.2.1安装和使用 AutoGen

4.2.1 Installing and consuming AutoGen

接下来的练习将着重讲解如何编写一个使用用户代理和可对话代理的基本多代理系统。不过在此之前,我们需要确保 AutoGen 已正确安装和配置。

This next exercise will look at coding a basic multi-agent system that uses a user proxy and conversable agent. Before we do that, though, we want to make sure AutoGen is installed and configured correctly.

在 VS Code 中打开终端,并按照附录 B 中的第 4 章安装说明进行操作,或者运行pip清单 4.4 中的命令。如果您已安装该requirements.txt文件,则也可以运行 AutoGen。

Open a terminal in VS Code, and run the entire chapter 4 install directions per appendix B, or run the pip command in listing 4.4. If you’ve installed the requirements.txt file, you’ll also be ready to run AutoGen.

清单 4.4安装 AutoGen
使用 pip 安装 pyautogen
pip install pyautogen

接下来,复制该chapter_04/OAI_CONFIG_LIST.example文件OAI_CONFIG_LIST,并从文件名中删除.example。然后,在 VS Code 中打开新文件,并在OAI_CONFIG_LIST清单 4.5 中输入您的 OpenAI 或 Azure 配置。根据您的 API 服务要求填写 API 密钥、模型和其他详细信息。AutoGen 可以与任何符合 OpenAI 客户端规范的模型配合使用。这意味着您可以通过 LM Studio 或其他服务(例如 Groq、Hugging Face 等)使用本地 LLM。

Next, copy the chapter_04/OAI_CONFIG_LIST.example to OAI_CONFIG_LIST, removing .example from the file name. Then, open the new file in VS Code, and enter your OpenAI or Azure configuration in the OAI_CONFIG_LIST file in listing 4.5. Fill in your API key, model, and other details per your API service requirements. AutoGen will work with any model that adheres to the OpenAI client. That means you can use local LLMs via LM Studio or other services such as Groq, Hugging Face, and more.

列表 4.5 OAI_CONFIG_LIST
[
    {
        “模型”: “gpt-4”                    #1
        "api_key": "<您的 OpenAI API 密钥>",           #2
        标签:["gpt-4", "工具"]
    },
    {
        “模型”: “<您的 Azure OpenAI 部署名称>,     #3
        "api_key": "<在此处填写您的 Azure OpenAI API 密钥>",     #4
        "base_url": "<您的 Azure OpenAI API 基础 URL>",    #5
        "api_type": "azure",
        "api_version": "2024-02-15-preview"
    }    
]
[
    {
        "model": "gpt-4",                     #1
        "api_key": "<your OpenAI API key here>",            #2
        "tags": ["gpt-4", "tool"]
    },
    {
        "model": "<your Azure OpenAI deployment name>",      #3
        "api_key": "<your Azure OpenAI API key here>",      #4
        "base_url": "<your Azure OpenAI API base here>",     #5
        "api_type": "azure",
        "api_version": "2024-02-15-preview"
    }    
]
#1 选择模型;建议使用 GPT-4。

#2 使用您通常使用的服务密钥。

#3 选择模型;建议使用 GPT-4。

#4 使用您通常使用的服务密钥。

#5 更改基本 URL 可以让您指向其他服务,而不仅仅是 Azure OpenAI。

现在,我们可以查看使用现成代理实现的基本多代理聊天程序的代码UserProxyConversableAgentautogen_start.py在 VS Code 中打开以下代码清单,并在运行文件之前先查看各个部分。

Now, we can look at the code for a basic multi-agent chat using the out-of-the-box UserProxy and ConversableAgent agents. Open autogen_start.py in VS Code, shown in the following listing, and review the parts before running the file.

清单 4.6 autogen_start.py
from autogen import ConversableAgent, UserProxyAgent, config_list_from_json


   config_list = config_list_from_json(
        env_or_file="OAI_CONFIG_LIST")     #1

   助手 = 可对话代理(
        “代理人”,
        llm_config={"config_list": config_list})     #2

   用户代理 = UserProxyAgent(     #3
        “用户”,
        代码执行配置={
            "work_dir": "工作中",
            "use_docker": False,
        },
        human_input_mode="ALWAYS"
        is_termination_msg=lambda x: x.get("content", "")
        .rstrip()
        .endswith("TERMINATE"),     #4
    )    
    user_proxy.initiate_chat(assistant, message="编写解决方案">
用一句话概括“fizz buzz?”    #5
from autogen import ConversableAgent, UserProxyAgent, config_list_from_json


   config_list = config_list_from_json(
        env_or_file="OAI_CONFIG_LIST")      #1

   assistant = ConversableAgent(
        "agent", 
        llm_config={"config_list": config_list})      #2

   user_proxy = UserProxyAgent(      #3
        "user",
        code_execution_config={
            "work_dir": "working",
            "use_docker": False,
        },
        human_input_mode="ALWAYS",
        is_termination_msg=lambda x: x.get("content", "")
        .rstrip()
        .endswith("TERMINATE"),      #4
    )    
    user_proxy.initiate_chat(assistant, message="write a solution 
 for fizz buzz in one line?")     #5
#1 从 JSON 文件 OAI_CONFIG_LIST 加载 LLM 配置。

#2 此代理直接与 LLM 通信。

#3 此代理将用户与助手之间的对话代理起来。

#4 设置终止消息允许代理进行迭代。

#5 通过用户代理与助手发起聊天以完成任务。

在 VS Code 的调试器中运行文件(F5)即可运行代码。清单 4.6 中的代码使用一个简单的任务来演示代码编写。清单 4.7 提供了一些可供选择的示例。这些编码任务也是作者用来评估法学硕士(LLM)编码能力的一些常规基准测试。

Run the code by running the file in VS Code in the debugger (F5). The code in listing 4.6 uses a simple task to demonstrate code writing. Listing 4.7 shows a few examples to choose from. These coding tasks are also some of the author’s regular baselines to assess an LLMs’ strength in coding.

列表 4.7简单的编码任务示例
编写一个 Python 函数来检查一个数是否为质数。
使用 Pygame 编写一个经典的 sname 游戏                  #1
使用 Pygame 在 Python 中编写一个经典的陨石游戏 #1
write a Python function to check if a number is prime
code a classic sname game using Pygame                   #1
code a classic asteroids game in Python using Pygame  #1
#1 要享受迭代这些任务的乐趣,请在 Windows 上使用适用于 Linux 的 Windows 子系统 (WSL),或使用 Docker。

代码运行几秒钟后,助手会将解决方案反馈给代理服务器。此时,代理服务器会提示您提供反馈。按回车键,即不提供任何反馈,代理服务器将运行代码以验证其是否按预期运行。

After the code starts in a few seconds, the assistant will respond to the proxy with a solution. At this time, the proxy will prompt you for feedback. Press Enter, essentially giving no feedback, and this will prompt the proxy to run the code to verify it operates as expected.

令人印象深刻的是,代理程序甚至会根据提示安装所需的软件包,例如 Pygame。然后它会运行代码,您将在终端、新窗口或浏览器中看到输出结果。如果代码在新窗口/浏览器中运行,您可以玩游戏或使用界面。

Impressively, the proxy agent will even take cues to install required packages such as Pygame. Then it will run the code, and you’ll see the output in the terminal or as a new window or browser. You can play the game or use the interface if the code shelled a new window/browser.

请注意,在 Windows 系统上,生成的窗口/浏览器不会自动关闭,需要退出整个程序。为避免此问题,请使用 Windows 子系统 Linux 版 (WSL) 或 Docker 运行代码。AutoGen 明确建议使用 Docker 作为代码执行代理,如果您熟悉容器技术,这不失为一个好选择。

Note that the spawned window/browser won’t close on Windows and will require exiting the entire program. To avoid this problem, run the code through Windows Subsystem for Linux (WSL) or Docker. AutoGen explicitly recommends using Docker for code execution agents, and if you’re comfortable with containers, this is a good option.

无论哪种方式,代理生成并运行代码后,working_dir清单 4.6 中指定的文件夹现在应该会包含一个带有代码的 Python 文件。这样您就可以随时运行代码、进行修改,甚至提出改进建议,我们稍后会看到。下一节,我们将探讨如何增强编码代理的功能。

Either way, after the proxy generates and runs the code, the working_dir folder set earlier in listing 4.6 should now have a Python file with the code. This will allow you to run the code at your leisure, make changes, or even ask for improvements, as we’ll see. In the next section, we’ll look at how to improve the capabilities of the coding agents.

4.2.2利用智能体评论家增强代码输出

4.2.2 Enhancing code output with agent critics

多智能体系统的一大优势在于,它可以在完成任务时自动分配多个角色/人物。对于任何开发者autogen_coding_critic.py来说,生成或协助编写代码都是一项极佳的优势,但如果这些代码还能得到审查和测试呢?在接下来的练习中,我们将向智能体系统中添加另一个智能体评论员,以协助完成编码任务。打开如下清单所示的文件。

One powerful benefit of multi-agent systems is the multiple roles/personas you can automatically assign when completing tasks. Generating or helping to write code can be an excellent advantage to any developer, but what if that code was also reviewed and tested? In the next exercise, we’ll add another agent critic to our agent system to help with coding tasks. Open autogen_coding_critic.py, as shown in the following listing.

列表 4.8 autogen_coding_critic.py
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json

config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")

用户代理 = UserProxyAgent(
    “用户”,
    代码执行配置={
        "work_dir": "工作中",
        "use_docker": False,
        "last_n_messages": 1,
    },
    human_input_mode="ALWAYS"
    is_termination_msg=lambda x:
x.get("content", "").rstrip().endswith("TERMINATE"),
)

工程师 = 助理代理(
    姓名="工程师",
    llm_config={"config_list": config_list},
    system_message=""
    您是一位专业的Python工程师,以其在以下领域的专业知识而闻名:
软件开发。
    你运用你的技能来创建软件应用程序、工具和
既实用又高效的游戏。
    你倾向于编写简洁、结构良好且易于理解的代码。
阅读和维护。    
    "",     #1
)

评论家 = 助理代理(
    name="评论者",
    llm_config={"config_list": config_list},
    system_message=""
    您是一位代码审查员,以严谨细致和尽职尽责而闻名。
符合标准。
    你的任务是仔细审查代码内容,找出任何有害或
不合格元件。
    您确保代码安全、高效并符合最佳实践。
实践。
    您将识别代码中存在的任何问题或需要改进的地方。
并将它们作为列表输出。
    "",     #2
)

def review_code(recipient, messages, sender, config):     #3
    返回 f"""
            请审阅并评价以下代码。

            {recipient.chat_messages_for_summary(sender)[-1]['content']}
            “””#3                    

user_proxy.register_nested_chats(     #4
    [
        {
            “接收者”:批评家,
            "消息": review_code,
            "summary_method": "last_msg",
            “max_turns”:1,
        }
    ],
    触发器=工程师,#4
)
任务 = """使用 Pygame 编写一个贪吃蛇游戏。"""

res = user_proxy.initiate_chat(
    收件人=工程师,
    消息=任务,
    最大转弯数=2,
    summary_method="last_msg"     #5
)
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json

config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")

user_proxy = UserProxyAgent(
    "user",
    code_execution_config={
        "work_dir": "working",
        "use_docker": False,
        "last_n_messages": 1,
    },
    human_input_mode="ALWAYS",
    is_termination_msg=lambda x: 
x.get("content", "").rstrip().endswith("TERMINATE"),
)

engineer = AssistantAgent(
    name="Engineer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a profession Python engineer, known for your expertise in 
software development.
    You use your skills to create software applications, tools, and 
games that are both functional and efficient.
    Your preference is to write clean, well-structured code that is easy 
to read and maintain.    
    """,      #1
)

critic = AssistantAgent(
    name="Reviewer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a code reviewer, known for your thoroughness and commitment 
to standards.
    Your task is to scrutinize code content for any harmful or 
substandard elements.
    You ensure that the code is secure, efficient, and adheres to best 
practices.
    You will identify any issues or areas for improvement in the code 
and output them as a list.
    """,      #2
)

def review_code(recipient, messages, sender, config):      #3
    return f"""
            Review and critque the following code.

            {recipient.chat_messages_for_summary(sender)[-1]['content']}
            """                       #3                    

user_proxy.register_nested_chats(      #4
    [
        {
            "recipient": critic,
            "message": review_code,
            "summary_method": "last_msg",
            "max_turns": 1,
        }
    ],
    trigger=engineer,                 #4
)
task = """Write a snake game using Pygame."""

res = user_proxy.initiate_chat(
    recipient=engineer, 
    message=task, 
    max_turns=2, 
    summary_method="last_msg"      #5
)
#1 这次,助手会收到一条系统/角色消息。

#2 创建第二个带有背景的助手评论代理。

#3 自定义函数帮助提取代码供评论员审查。

#4 在评论员和工程师之间创建嵌套聊天。

#5 代理发起聊天,并设置最大延迟和显式摘要方法。

在 VS Code 中以调试模式运行该autogen_coding_critic.py文件,并观察代理之间的对话。这次,代码执行完毕后,评论器也会被触发并做出响应。然后,评论器会添加注释和建议来改进代码。

Run the autogen_coding_critic.py file in VS Code in debug mode, and watch the dialog between the agents. This time, after the code returns, the critic will also be triggered to respond. Then, the critic will add comments and suggestions to improve the code.

嵌套聊天在支持和控制客服人员交互方面效果不错,但我们将在下一节中介绍一种更好的方法。不过在此之前,我们将在下一节回顾一下自动生成缓存的重要性。

Nested chats work well for supporting and controlling agent interactions, but we’ll see a better approach in the following section. Before that though, we’ll review the importance of the AutoGen cache in the next section.

4.2.3了解自动生成缓存

4.2.3 Understanding the AutoGen cache

作为可交互的多代理平台,AutoGen 在多次聊天迭代中会消耗大量令牌。如果您要求 AutoGen 处理复杂或新颖的问题,您的 LLM 甚至可能会遇到令牌限制;因此,AutoGen 支持多种方法来减少令牌使用量。

AutoGen can consume many tokens over chat iterations as a conversable multi-agent platform. If you ask AutoGen to work through complex or novel problems, you may even encounter token limits on your LLM; because of this, AutoGen supports several methods to reduce token usage.

AutoGen 使用缓存来存储进度并减少令牌使用量。缓存默认启用,您可能已经见过它。如果您查看当前工作文件夹,会发现一个.cache文件夹,如图 4.8 所示。缓存允许代理在对话中断后继续对话。

AutoGen uses caching to store progress and reduce token usage. Caching is enabled by default, and you may have already encountered it. If you check your current working folder, you’ll notice a .cache folder, as shown in figure 4.8. Caching allows your agents to continue conversations if they get interrupted.

数字
图 4.8自动生成的缓存和工作文件夹

在代码中,您可以控制代理运行的缓存文件夹,如清单 4.9 所示。通过将调用包装initiate_chatwith语句中,您可以控制缓存的位置和种子值。这样,您只需设置先前缓存的种子值,即可保存并返回到将来运行时间较长的 AutoGen 任务cache_seed

In code, you can control the cache folder for your agent’s run, as shown in listing 4.9. By wrapping the initiate_chat call with the with statement, you can control the location and seed for the cache. This will allow you to save and return to long-running AutoGen tasks in the future by just setting the cache_seed for the previous cache.

列表 4.9设置缓存文件夹
使用 Cache.disk(cache_seed=42) 作为缓存:    #1
    res = user_proxy.initiate_chat(
        收件人=工程师,
        消息=任务,
        最大转弯数=2,
        summary_method="last_msg",
        cache=cache,     #2
    )
with Cache.disk(cache_seed=42) as cache:     #1
    res = user_proxy.initiate_chat(
        recipient=engineer,
        message=task,
        max_turns=2,
        summary_method="last_msg",
        cache=cache,      #2
    )
#1 设置 seed_cache 指定缓存的具体位置。

#2 将缓存设置为参数。

这种缓存功能允许您从之前的缓存位置继续操作,并捕获之前的运行结果。它也是演示和检查客服对话如何生成结果的绝佳方式。下一节,我们将探讨 AutoGen 支持的另一种对话模式——群聊。

This caching ability allows you to continue operations from the previous cache location and captures previous runs. It can also be a great way to demonstrate and inspect how an agent conversation generated the results. In the next section, we’ll look at another conversational pattern in which AutoGen supports group chat.

4.3与代理商和自动生成进行群聊

4.3 Group chat with agents and AutoGen

聊天委托和嵌套聊天或对话的一个问题是信息传递。如果你玩过“传话游戏”,你就会亲身体验到信息在多次传递过程中变化之快。对于智能代理来说,情况也完全一样,通过嵌套或顺序对话进行聊天可能会改变任务,甚至改变预期结果。

One problem with chat delegation and nested chats or conversations is the conveyance of information. If you’ve ever played the telephone game, you’ve witnessed this firsthand and experienced how quickly information can change over iterations. With agents, this is certainly no different, and chatting through nested or sequential conversations can alter the task or even the desired result.

图 4.9 展示了嵌套式群聊和协作式群聊之间的区别。我们在上一节中使用了嵌套式聊天功能构建了一个嵌套式代理聊天。在本节中,我们使用群聊来提供更具协作性的体验。

Figure 4.9 shows the difference between nested and collaborative group chats. We used the nested chat feature in the previous section to build a nested agent chat. In this section, we use the group chat to provide a more collaborative experience.

数字
图 4.9可对话代理的嵌套聊天和群聊之间的区别

打开autogen_coding_group.py相关部分,如清单 4.10 所示。代码与之前的练习类似,但现在引入了GroupChat`and` 和 `.` GroupChatManager。代理和消息都保存在群聊中,类似于 Slack 或 Discord 等应用程序中的消息频道。聊天管理器负责协调消息回复,以减少对话重叠。

Open autogen_coding_group.py with relevant parts, as shown in listing 4.10. The code is similar to the previous exercise but now introduces GroupChat and GroupChatManager. The agents and messages are held with the group chat, similar to a messaging channel in applications such as Slack or Discord. The chat manager coordinates the message responses to reduce conversation overlap.

清单 4.10 autoget_coding_group.py(相关章节)
用户代理 = UserProxyAgent(
    “用户”,
    代码执行配置={
        "work_dir": "工作中",
        "use_docker": False,
        "last_n_messages": 3,
    },
    human_input_mode="NEVER"    #1
)

llm_config = {"config_list": config_list}

工程师 = 助理代理(…     #2


评论家 = 助理代理(… #2


群聊 = 群聊(代理=[user_proxy,
                              工程师,
                              评论家],
                              messages=[],
                              max_round=20)     #3
管理器 = GroupChatManager(groupchat=groupchat,
                           llm_config=llm_config)    #4

任务 = """使用 Pygame 编写一个贪吃蛇游戏。"""

使用 Cache.disk(cache_seed=43) 作为缓存:
    res = user_proxy.initiate_chat(
        收件人=经理,
        消息=任务,
        cache=cache,
    )
user_proxy = UserProxyAgent(
    "user",
    code_execution_config={
        "work_dir": "working",
        "use_docker": False,
        "last_n_messages": 3,
    },
    human_input_mode="NEVER",     #1
)

llm_config = {"config_list": config_list}

engineer = AssistantAgent(…      #2


critic = AssistantAgent(…       #2


groupchat = GroupChat(agents=[user_proxy, 
                              engineer, 
                              critic], 
                              messages=[], 
                              max_round=20)      #3
manager = GroupChatManager(groupchat=groupchat, 
                           llm_config=llm_config)     #4

task = """Write a snake game using Pygame."""

with Cache.disk(cache_seed=43) as cache:
    res = user_proxy.initiate_chat(
        recipient=manager,
        message=task,
        cache=cache,
    )
#1 人工输入已设置为“从不”,因此不会有人工反馈。

#2 代码已省略,但请参阅文件中角色更改的说明。

#3 此对象保存与所有代理的连接并存储消息。

#4 管理员像主持人一样协调对话。

运行此练习,您将看到代理如何协作。工程师现在将接收评论者的反馈,并采取相应措施来解决评论者的建议。这也使得代理能够参与到所有对话中。

Run this exercise, and you’ll see how the agents collaborate. The engineer will now take feedback from the critic and undertake operations to address the critic’s suggestions. This also allows the proxy to engage in all of the conversation.

群组对话是增强智能体协作完成任务能力的绝佳方式。然而,群组对话也更加冗长,且令牌消耗量更大。当然,随着层级逻辑模型(LLM)的成熟,其上下文令牌窗口的大小和令牌处理成本也会随之增加。随着令牌窗口的增大,对令牌消耗的担忧最终可能会消失。

Group conversations are an excellent way to strengthen your agents’ abilities as they collaborate on tasks. However, they are also substantially more verbose and token expensive. Of course, as LLMs mature, so do the size of their context token windows and the price of token processing. As token windows increase, concerns over token consumption may eventually go away.

AutoGen 是一个功能强大的多智能体平台,可通过 Web 界面或代码进行体验。无论您偏好哪种方式,这款智能体协作工具都是构建代码或其他复杂任务的理想平台。当然,它并非唯一的平台,您将在下一节中看到,我们将探讨一个名为 CrewAI 的新兴平台。

AutoGen is a powerful multi-agent platform that can be experienced using a web interface or code. Whatever your preference, this agent collaboration tool is an excellent platform for building code or other complex tasks. Of course, it isn’t the only platform, as you’ll see in the next section, where we explore a newcomer called CrewAI.

4.4使用 CrewAI 构建代理团队

4.4 Building an agent crew with CrewAI

CrewAI 在多智能体系统领域相对较新。AutoGen 最初是基于研究开发并逐步扩展的,而 CrewAI 则是专为企业系统而设计的。因此,该平台更加稳健,但也因此在某些方面的扩展性较差。

CrewAI is relatively new to the realm of multi-agent systems. Where AutoGen was initially developed from research and then extended, CrewAI is built with enterprise systems in mind. As such, the platform is more robust, making it less extensible in some areas.

CrewAI 允许您构建一个代理团队,使其专注于完成特定任务目标。与 AutoGen 不同,CrewAI 不需要使用用户代理,而是假定代理之间仅相互协作。

With CrewAI, you build a crew of agents to focus on specific areas of a task goal. Unlike AutoGen, CrewAI doesn’t require the use of the user proxy agent but instead assumes the agents only work among themselves.

图 4.10 展示了 CrewAI 平台的主要组成部分、它们之间的连接方式以及主要功能。图中所示为一个顺序处理代理系统,包含通用研究员和撰稿人代理。代理会被分配任务,这些任务可能还包括辅助工具或内存。

Figure 4.10 shows the main elements of the CrewAI platform, how they connect together, and their primary function. It shows a sequential-processing agent system with generic researcher and writer agents. Agents are assigned tasks that may also include tools or memory to assist them.

数字
图 4.10 CrewAI 系统的组成

CrewAI 支持两种主要的处理方式:顺序处理和层级处理。图 4.10 展示了顺序处理过程,它遍历给定的智能体及其相关任务。下一节,我们将深入研究一些代码,以设置一个团队并利用它完成一个目标,从而创作一个好笑的笑话。

CrewAI supports two primary forms of processing: sequential and hierarchical. Figure 4.10 shows the sequential process by iterating across the given agents and their associated tasks. In the next section, we dig into some code to set up a crew and employ it to complete a goal and create a good joke.

4.4.1创建 CrewAI 代理的搞笑团队

4.4.1 Creating a jokester crew of CrewAI agents

CrewAI 的设置比 AutoGen 更复杂,但也因此提供了更强大的控制功能和额外的指导,能够提供更具体的上下文信息来指导智能体完成给定的任务。虽然这并非完美无缺,但它的确比 AutoGen 提供了更强大的开箱即用控制功能。

CrewAI requires more setup than AutoGen, but this also allows for more control and additional guides, which provide more specific context to guide the agents in completing the given task. This isn’t without problems, but it does offer more control than AutoGen out of the box.

在 VS Code 中打开crewai_introduction.py并查看顶部部分,如清单 4.11 所示。配置代理需要许多设置,包括角色、目标、详细程度、内存、背景故事、委托,甚至还有工具(未显示)。在本例中,我们使用了两个代理:一位资深笑话研究员和一位笑话撰写者。

Open crewai_introduction.py in VS Code and look at the top section, as shown in listing 4.11. Many settings are required to configure an agent, including the role, goal, verboseness, memory, backstory, delegation, and even tools (not shown). In this example, we’re using two agents: a senior joke researcher and a joke writer.

房源 4.11 crewai_introduction.py(经纪人部分)
from crewai import Agent, Crew, Process, Task
from dotenv import load_dotenv

load_dotenv()

joke_researcher = Agent(     #1
    role="高级笑话研究员",
    目标="研究以下{主题}中哪些事情令人发笑",
    verbose=True,     #2
    memory=True,     #3
    背景故事=(     #4
        “你深受滑稽幽默的启发,是一位经验丰富的笑话研究者。”
        谁知道什么能逗人发笑。你在这方面很有天赋。
        “日常生活中那些有趣的事情,能把沉闷的时刻变成……”
        “笑料百出。”
    ),
    allow_delegation=True,    #5
)

笑话撰写者 = Agent(    #6
    角色“笑话作家”,
    目标="就以下{主题}写一个幽默有趣的笑话",
    verbose=True,    #7
    memory=True,     #8
    背景故事=(    #9
        “你是一位很有幽默天赋的笑话作家。你可以把……”
        “一个简单的想法,却能引发一场爆笑盛宴。你真会说话。”
        “只需几句台词就能逗人发笑。”
    ),
    allow_delegation=False,#5
)
from crewai import Agent, Crew, Process, Task
from dotenv import load_dotenv

load_dotenv()

joke_researcher = Agent(      #1
    role="Senior Joke Researcher",
    goal="Research what makes things funny about the following {topic}",
    verbose=True,      #2
    memory=True,      #3
    backstory=(      #4
        "Driven by slapstick humor, you are a seasoned joke researcher"
        "who knows what makes people laugh. You have a knack for finding"
        "the funny in everyday situations and can turn a dull moment into"
        "a laugh riot."
    ),
    allow_delegation=True,     #5
)

joke_writer = Agent(     #6
    role="Joke Writer",
    goal="Write a humourous and funny joke on the following {topic}",
    verbose=True,     #7
    memory=True,      #8
    backstory=(     #9
        "You are a joke writer with a flair for humor. You can turn a"
        "simple idea into a laugh riot. You have a way with words and"
        "can make people laugh with just a few lines."
    ),
    allow_delegation=False,    #5
)
#1 创建代理并为其设定目标。

#2 启用详细输出允许代理向终端输出信息。

#3 支持代理使用内存。

#4 背景故事是代理的背景信息——即其角色。

#5 代理可以被委托或允许委托他人;True 表示它们可以委托他人。

#6 创建代理并为其设定目标 。

#7 启用详细输出允许代理向终端输出信息。

#8 支持代理使用内存。

#9 背景故事是代理的背景信息——即其角色。

继续向下看代码,我们接下来会看到任务,如清单 4.12 所示。任务表示代理完成系统主要目标的过程。它们还将代理与特定任务关联起来,定义该任务的输出,并且可能包括任务的执行方式。

Moving down the code, we next see the tasks, as shown in listing 4.12. Tasks denote an agent’s process to complete the primary system goal. They also link an agent to work on a specific task, define the output from that task, and may include how it’s executed.

清单 4.12 crewai_introduction.py(任务部分)
研究任务 = 任务(        #1
    描述=(
        “找出以下话题:{topic} 如此有趣的原因。”
        “一定要包含使其幽默的关键要素。”
        此外,请对当前的社会趋势进行分析。
        “以及它如何影响人们对幽默的感知。”
    ),
    预期输出:“一份包含三段内容的综合报告”
     关于最新的笑话。”              #2
    agent=joke_researcher,     #3
)

write_task = Task(  #4
    描述=(
        请围绕{主题}创作一个有见地、幽默且具有社会意识的笑话。
        “一定要包含让它变得有趣的关键要素,而且……”
        “与当前社会趋势相关。”
    ),
    预期输出:“关于{topic}的一个笑话。”  #5
    代理人=笑话作家,#3
    async_execution=False,         #6
    输出文件="the_best_joke.md",     #7
)
research_task = Task(         #1
    description=(
        "Identify what makes the following topic:{topic} so funny."
        "Be sure to include the key elements that make it humourous."
        "Also, provide an analysis of the current social trends,"
        "and how it impacts the perception of humor."
    ),
    expected_output="A comprehensive 3 paragraphs long report 
     on the latest jokes.",               #2
    agent=joke_researcher,      #3
)

write_task = Task(   #4
    description=(
        "Compose an insightful, humourous and socially aware joke on {topic}."
        "Be sure to include the key elements that make it funny and"
        "relevant to the current social trends."
    ),
    expected_output="A joke on {topic}.",   #5
    agent=joke_writer,        #3
    async_execution=False,          #6
    output_file="the_best_joke.md",      #7
)
#1 任务描述定义了智能体如何完成任务。

#2 明确定义了执行任务的预期输出。

#3 被分配执行任务的智能体。

#4 任务描述定义了智能体如何完成任务。

#5 明确定义了执行任务的预期输出。

#6 智能体是否应异步执行。

#7 智能体将生成的任何输出。

现在,我们可以看到所有内容是如何组合在一起的,Crew如清单 4.13 所示,位于文件底部。同样,在构建时可以设置许多选项Crew,包括代理、任务、进程类型、内存、缓存、每分钟最大请求数(max_rpm)以及船员是否共享。

Now, we can see how everything comes together as the Crew at the bottom of the file, as shown in listing 4.13. Again, many options can be set when building the Crew, including the agents, tasks, process type, memory, cache, maximum requests per minute (max_rpm), and whether the crew shares.

清单 4.13 crewai_introduction.py(船员部分)
船员 = 船员(
    代理=[笑话研究员,笑话撰写者],   #1
    tasks=[研究任务,写作任务]    #2
    process=Process.sequential,     #3
    memory=True,     #4
    cache=True,    #5
    最大转速=100,    #6
    share_crew=True,    #7
)

result = crew.kickoff(inputs={"topic": "AI工程师笑话"})
打印(结果)
crew = Crew(
    agents=[joke_researcher, joke_writer],    #1
    tasks=[research_task, write_task],     #2
    process=Process.sequential,      #3
    memory=True,      #4
    cache=True,     #5
    max_rpm=100,     #6
    share_crew=True,     #7
)

result = crew.kickoff(inputs={"topic": "AI engineer jokes"})
print(result)
#1 组成船员的智能体

#2 智能体可以执行的任务

#3 定义智能体的交互方式

#4 系统是否使用内存;如果智能体/任务启用了内存,则需要设置此项

#5 系统是否使用缓存,类似于自动生成

#6 系统每分钟最大请求数限制

#7 船员是否共享信息,类似于群聊

审阅完毕后,在 VS Code 中运行文件(F5),并观察终端中来自团队成员的对话和消息。正如你现在可能已经了解到的,这个代理系统的目标是生成与人工智能工程相关的笑话。以下是代理系统运行几次后生成的一些比较有趣的笑话:

When you’re done reviewing, run the file in VS Code (F5), and watch the terminal for conversations and messages from the crew. As you can probably tell by now, the goal of this agent system is to craft jokes related to AI engineering. Here are some of the funnier jokes generated over a few runs of the agent system:

  • 电脑为什么会冷?因为它忘记关掉Windows系统了。
  • Why was the computer cold? It left Windows open.
  • 为什么人工智能工程师不和他们的算法玩捉迷藏呢?因为无论他们躲在哪里,算法总能在“过拟合”的房间里找到他们!
  • Why don’t AI engineers play hide and seek with their algorithms? Because no matter where they hide, the algorithms always find them in the “overfitting” room!
  • 人工智能工程师最喜欢的歌是什么?“我只是打电话来说我爱你……并收集更多数据用于我的语音识别软件。”
  • What is an AI engineer’s favorite song? “I just called to say I love yo… . and to collect more data for my voice recognition software.”
  • 为什么这位人工智能工程师破产了?因为他把所有的钱都花在了买饼干上,但他的浏览器却不停地把它们吃掉。
  • Why was the AI engineer broke? Because he spent all his money on cookies, but his browser kept eating them.

在运行更多次“玩笑小队”测试之前,你应该先阅读下一节。本节将介绍如何为多智能体系统添加可观测性。

Before you run more iterations of the joke crew, you should read the next section. This section shows how to add observability to the multi-agent system.

4.4.2观察代理与 AgentOps 的交互

4.4.2 Observing agents working with AgentOps

观察诸如多智能体系统之类的复杂集合体对于理解可能出现的各种问题至关重要。通过应用追踪实现可观测性是任何复杂系统的关键要素,尤其对于企业级应用系统而言更是如此。

Observing a complex assemblage such as a multi-agent system is critical to understanding the myriad of problems that can happen. Observability through application tracing is a key element of any complex system, especially one engaged in enterprise use.

CrewAI 支持连接到一个名为 AgentOps 的专用代理运维平台。该可观测性平台是通用的,旨在支持任何 LLM 专用代理平台的可观测性。目前,尚未公布定价或商业化详情。

CrewAI supports connecting to a specialized agent operations platform appropriately called AgentOps. This observability platform is generic and designed to support observability with any agent platform specific to LLM usage. Currently, no pricing or commercialization details are available.

连接到 AgentOps 非常简单,只需安装软件包、获取 API 密钥,然后在您的团队配置中添加一行代码即可。接下来的练习将详细介绍连接和运行 AgentOps 的步骤。

Connecting to AgentOps is as simple as installing the package, getting an API key, and adding a line of code to your crew setup. This next exercise will go through the steps to connect and run AgentOps.

清单 4.14 展示了agentops如何使用该软件包进行安装pip。您可以单独安装该软件包,也可以将其作为附加组件安装crewai。请记住,AgentOps 还可以连接到其他代理平台以实现可观测性。

Listing 4.14 shows installing the agentops package using pip. You can install the package alone or as an additional component of the crewai package. Remember that AgentOps can also be connected to other agent platforms for observability.

清单 4.14安装 AgentOps
pip 安装 agentops

或者作为 CrewAI 的一个选项

pip install crewai[agentops]
pip install agentops

or as an option with CrewAI

pip install crewai[agentops]

使用 AgentOps 之前,您需要注册一个 API 密钥。以下是撰写本文时注册密钥的一般步骤:

Before using AgentOps, you need to sign up for an API key. Following are the general steps to sign up for a key at the time of writing:

  1. 请在浏览器中访问https://app.agentops.ai 。
  2. Visit https://app.agentops.ai in your browser.
  3. 注册账号。
  4. Sign up for an account.
  5. 创建项目,或使用默认项目。
  6. Create a project, or use the default.
  7. 前往“设置”>“项目和 API 密钥”。
  8. Go to Settings > Projects and API Keys.
  9. 复制和/或生成新的 API 密钥;这会将密钥复制到您的浏览器。
  10. Copy and/or generate a new API key; this will copy the key to your browser.
  11. 将密钥粘贴到.env项目文件中。
  12. Paste the key to your .env file in your project.

复制 API 密钥后,它应该类似于下面列表中显示的示例。

After the API key is copied, it should resemble the example shown in the following listing.

清单 4.15 env.:添加 AgentOps 密钥
AGENTOPS_API_KEY="您的 API 密钥"
AGENTOPS_API_KEY="your API key"

现在,我们需要向 CrewAI 脚本添加几行代码。清单 4.16 显示了添加到crewai_agentops.py文件中的代码。创建自己的脚本时,只需添加agentops相应的包,并在使用 CrewAI 时对其进行初始化即可。

Now, we need to add a few lines of code to the CrewAI script. Listing 4.16 shows the additions as they are added to the crewai_agentops.py file. When creating your own scripts, all you need to do is add the agentops package and initialize it when using CrewAI.

清单 4.16 crewai_agentops.py(AgentOps 新增功能)
导入 agentops     #1
from crewai import Agent, Crew, Process, Task
from dotenv import load_dotenv

load_dotenv()
agentops.init()    #2
import agentops      #1
from crewai import Agent, Crew, Process, Task
from dotenv import load_dotenv

load_dotenv()
agentops.init()     #2
#1 添加所需的软件包

#2 确保在加载环境变量后初始化软件包。

在 VS Code 中运行该crewai_agentops.py文件(按 F5),即可像以前一样观察代理的运行情况。但现在,您可以访问 AgentOps 控制面板,查看各个层级的代理交互。

Run the crewai_agentops.py file in VS Code (F5), and watch the agents work as before. However, you can now go to the AgentOps dashboard and view the agent interactions at various levels.

图 4.11 显示了用于运行笑话团队以创作最佳笑话的仪表板。其中统计了多项数据,包括总时长、运行环境、提示和完成令牌、LLM 调用时间以及预估成本。查看成本数据既令人警醒,也表明了代理对话可能会变得多么冗长。

Figure 4.11 shows the dashboard for running the joke crew to create the best joke. Several statistics include total duration, the run environment, prompt and completion tokens, LLM call timings, and estimated cost. Seeing the cost can be both sobering and indicative of how verbose agent conversations can become.

数字
图 4.11用于运行笑话团队的 AgentOps 控制面板

AgentOps平台是任何智能体平台的绝佳补充。虽然它已内置于CrewAI中,但其可观测性功能也可添加到AutoGen或其他框架中,这一点非常实用。AgentOps的另一个吸引人之处在于,它专注于观察智能体交互,而非由机器学习运维平台转型而来。未来,我们很可能会看到更多智能体可观测性模式的涌现。

The AgentOps platform is an excellent addition to any agent platform. While it’s built into CrewAI, it’s helpful that the observability could be added to AutoGen or other frameworks. Another attractive thing about AgentOps is that it’s dedicated to observing agent interactions and not transforming from a machine learning operations platform. In the future, we’ll likely see the spawn of more agent observability patterns.

可观测性平台能够提供的成本观测功能,其优势不容忽视。您是否注意到图 4.11 显示,创建一个笑话的成本略高于 50 美分?智能体功能强大,但成本也可能非常高昂,因此,从实用性和商业化的角度来看,了解这些成本至关重要。

One benefit that can’t be overstated is the cost observation that an observability platform can provide. Did you notice in figure 4.11 that creating a single joke costs a little over 50 cents? Agents can be very powerful, but they can also become very costly, and it’s essential to observe what those costs are in terms of practicality and commercialization.

在本章最后一节中,我们将回到 CrewAI,并再次探讨如何构建能够编写游戏代码的智能体。这将为 AutoGen 和 CrewAI 的功能提供一个极佳的对比。

In the last section of this chapter, we’ll return to CrewAI and revisit building agents that can code games. This will provide an excellent comparison between the capabilities of AutoGen and CrewAI.

4.5重新审视使​​用 CrewAI 进行智能体编码

4.5 Revisiting coding agents with CrewAI

比较多智能体平台性能的一个好方法是让机器人执行类似的任务。在接下来的练习中,我们将使用 CrewAI 来模拟游戏编程团队。当然,这种方法也可以应用于其他编程任务。

A great way to compare capabilities between multi-agent platforms is to implement similar tasks in a bot. In this next set of exercises, we’ll employ CrewAI as a game programming team. Of course, this could be adapted to other coding tasks as well.

在 VS Code 中打开crewai_coding_crew.py,我们首先查看清单 4.17 中的代理部分。在这里,我们将创建一个高级工程师、一个 QA 工程师和一个首席 QA 工程师,并赋予他们角色、目标和背景故事。

Open crewai_coding_crew.py in VS Code, and we’ll first review the agent section in listing 4.17. Here, we’re creating a senior engineer, a QA engineer, and a chief QA engineer with a role, goal, and backstory.

房源 4.17 crewai_coding_crew.py(经纪人部分)
print("## 欢迎加入游戏团队")     #1
打印(” -  -  -  -  -  -  -  -  -  -  -  -  -  -  - -”)
game = input("你想开发什么类型的游戏?"
具体机制是什么?\n")


高级工程师代理 = 代理(
    role="高级软件工程师",
    目标“根据需要创建软件”,
    背景故事=dedent(
        """
        您是一家领先科技智库的高级软件工程师。
        您在Python编程方面拥有丰富的经验,请尽您所能。
        生成完美代码
        """
    ),
    allow_delegation=False,
    verbose=True,
)

qa_engineer_agent = Agent(
    角色“软件质量控制工程师”
    目标:通过分析代码创建完美代码
这是为错误而给出的”,
    背景故事=dedent(
        """
        你是一名专门从事代码检查的软件工程师。
        你善于发现错误。你观察力敏锐,注重细节,并且有发现问题的天赋。
        隐藏的漏洞。
        您需要检查是否存在缺失的导入、变量声明和不匹配的情况。
        括号和语法错误。
        您还需要检查安全漏洞和逻辑错误。
        """
    ),
    allow_delegation=False,
    verbose=True,
)

chief_qa_engineer_agent = Agent(
    role="首席软件质量控制工程师",
    目标:“确保代码完成它应该完成的工作”。
    背景故事=dedent(
        """
        您是一家领先企业的首席软件质量控制工程师。
        技术智库。您负责确保代码
        所写的内容能够完成它应该完成的工作。
        您负责检查代码错误并确保其正常运行。
        它的质量非常高。
        """
    ),
    allow_delegation=True,    #2
    verbose=True,
)
print("## Welcome to the Game Crew")      #1
print("-------------------------------")
game = input("What is the game you would like to build?
 What will be the mechanics?\n")


senior_engineer_agent = Agent(
    role="Senior Software Engineer",
    goal="Create software as needed",
    backstory=dedent(
        """
        You are a Senior Software Engineer at a leading tech think tank.
        Your expertise in programming in python. and do your best to
        produce perfect code
        """
    ),
    allow_delegation=False,
    verbose=True,
)

qa_engineer_agent = Agent(
    role="Software Quality Control Engineer",
    goal="create prefect code, by analizing the code 
 that is given for errors",
    backstory=dedent(
        """
        You are a software engineer that specializes in checking code
        for errors. You have an eye for detail and a knack for finding
        hidden bugs.
        You check for missing imports, variable declarations, mismatched
        brackets and syntax errors.
        You also check for security vulnerabilities, and logic errors
        """
    ),
    allow_delegation=False,
    verbose=True,
)

chief_qa_engineer_agent = Agent(
    role="Chief Software Quality Control Engineer",
    goal="Ensure that the code does the job that it is supposed to do",
    backstory=dedent(
        """
        You are a Chief Software Quality Control Engineer at a leading
        tech think tank. You are responsible for ensuring that the code
        that is written does the job that it is supposed to do.
        You are responsible for checking the code for errors and ensuring
        that it is of the highest quality.
        """
    ),
    allow_delegation=True,     #2
    verbose=True,
)
#1 允许用户输入游戏指令

#2 只有首席质量保证工程师才能委派任务。

向下滚动文件即可显示代理任务,如清单 4.18 所示。任务描述和预期输出应该很容易理解。同样,每个代理都有特定的任务,以便在完成任务时提供更好的上下文信息。

Scrolling down in the file will display the agent tasks, as shown in listing 4.18. The task descriptions and expected output should be easy to follow. Again, each agent has a specific task to provide better context when working to complete the task.

清单 4.18 crewai_coding_crew.py(任务部分)
code_task = Task(
    描述=f"""
你将使用 Python 创建一个游戏,以下是操作说明:
        指示
        ------------
        {游戏}            #1
        你将使用 Python 编写游戏代码。“"""
    预期输出:“您的最终答案必须是
完整的 Python 代码,只有 Python 代码,没有其他任何东西。
    agent=高级工程师代理,
)

qa_task = Task(
    description=f"""你正在帮助创建一个游戏
以下是使用 Python 的说明:
        指示
        ------------
        游戏 #1
        使用你得到的代码,检查错误。检查逻辑错误。
        语法错误、缺少导入语句、缺少变量声明
括号不匹配,
        以及安全漏洞。”
    expected_output="输出您在代码中发现的问题列表。"
    agent=qa_engineer_agent,
)

评估任务 = 任务(
    description=f"""你正在帮助创建一个游戏
以下是使用 Python 的说明:
        指示
        ------------
        游戏 #1
        您将检查代码,以确保其完整性,并且
        它能完成它应该完成的工作。
    预期输出:“您的最终答案必须是
修正了一整段 Python 代码,只有 Python 代码,没有其他任何改动。
    agent=首席质量保证工程师代理
)
code_task = Task(
    description=f"""
You will create a game using python, these are the instructions:
        Instructions
        ------------
        {game}             #1
        You will write the code for the game using python.""",
    expected_output="Your Final answer must be the 
 full python code, only the python code and nothing else.",
    agent=senior_engineer_agent,
)

qa_task = Task(
    description=f"""You are helping create a game 
 using python, these are the instructions:
        Instructions
        ------------
        {game}            #1
        Using the code you got, check for errors. Check for logic errors,
        syntax errors, missing imports, variable declarations, 
mismatched brackets,
        and security vulnerabilities.""",
    expected_output="Output a list of issues you found in the code.",
    agent=qa_engineer_agent,
)

evaluate_task = Task(
    description=f"""You are helping create a game 
 using python, these are the instructions:
        Instructions
        ------------
        {game}            #1
        You will look over the code to insure that it is complete and
        does the job that it is supposed to do. """,
    expected_output="Your Final answer must be the 
 corrected a full python code, only the python code and nothing else.",
    agent=chief_qa_engineer_agent,
)
#1 游戏指令使用 Python 格式替换到提示符中。

最后,我们可以通过查看文件末尾的内容(如清单 4.19 所示)来了解整个过程。这种人员配置与我们之前看到的非常相似。每个代理和任务,以及详细模式和进程属性都会被添加进去。在这个例子中,我们将继续使用顺序方法。

Finally, we can see how this comes together by going to the bottom of the file, as shown in listing 4.19. This crew configuration is much like what we’ve seen before. Each agent and task are added, as well as the verbose and process attributes. For this example, we’ll continue to use sequential methods.

清单 4.19 crewai_coding_crew.py(船员部分)
船员 = 船员(
    代理=[高级工程师代理,
            qa_engineer_agent,
            首席质量保证工程师代理],
    tasks=[code_task, qa_task, evaluate_task],
    详细模式=2,  
    process=Process.sequential,     #1
)

让你的团队开始工作!
结果 = crew.kickoff()   #2

打印(”######################”)
打印(结果)
crew = Crew(
    agents=[senior_engineer_agent, 
            qa_engineer_agent, 
            chief_qa_engineer_agent],
    tasks=[code_task, qa_task, evaluate_task],
    verbose=2,  
    process=Process.sequential,      #1
)

# Get your crew to work!
result = crew.kickoff()    #2

print("######################")
print(result)
#1 流程是循序渐进的。

#2 启动会议上没有提供其他背景信息。

运行 VS Code (F5) 文件后,系统会提示您输入游戏指令。输入一些指令,例如贪吃蛇游戏或其他您选择的游戏。然后,让程序运行,并观察它们的运行结果。

When you run the VS Code (F5) file, you’ll be prompted to enter the instructions for writing a game. Enter some instructions, perhaps the snake game or another game you choose. Then, let the agents work, and observe what they produce.

有了首席质量保证工程师的加入,最终结果通常会比使用 AutoGen 生成的更好,至少在开箱即用的情况下是如此。如果您查看代码,会发现它通常遵循良好的模式,在某些情况下,甚至可能包含测试用例和单元测试。

With the addition of the chief QA engineer, the results will generally look better than what was produced with AutoGen, at least out of the box. If you review the code, you’ll see that it generally follows good patterns and, in some cases, may even include tests and unit tests.

在本章结束之前,我们将对船员的处理模式进行最后一次更改。之前,我们采用的是顺序处理,如图 4.10 所示。图 4.12 展示了 CrewAI 中的分层处理方式。

Before we finish the chapter, we’ll make one last change to the crew’s processing pattern. Previously, we employed sequential processing, as shown in figure 4.10. Figure 4.12 shows what hierarchical processing looks like in CrewAI.

数字
图 4.12通过机组经理协调的代理的层级处理

添加此管理器是一个相对简单的过程。清单 4.20 显示了对新文件的额外代码更改,该文件以分层方法使用编码团队。除了从 LangChain 导入用于连接到 OpenAI 的类之外,另一个新增内容是将此类添加为团队管理器manager_llm

Adding this manager is a relatively simple process. Listing 4.20 shows the additional code changes to a new file that uses the coding crew in a hierarchical method. Aside from importing a class for connecting to OpenAI from LangChain, the other addition is adding this class as the crew manger, manager_llm.

清单 4.20 crewai_hierarchy.py(船员经理部分)
from langchain_openai import ChatOpenAI     #1

船员 = 船员(
    代理=[高级工程师代理,
            qa_engineer_agent,
            首席质量保证工程师代理],
    tasks=[code_task, qa_task, evaluate_task],
    详细模式=2,  
    process=Process.hierarchical,    #2
    manager_llm=ChatOpenAI(               #3
        temperature=0, model="gpt-4" #3
    ),   #4
)#4
from langchain_openai import ChatOpenAI      #1

crew = Crew(
    agents=[senior_engineer_agent, 
            qa_engineer_agent, 
            chief_qa_engineer_agent],
    tasks=[code_task, qa_task, evaluate_task],
    verbose=2,  
    process=Process.hierarchical,     #2
    manager_llm=ChatOpenAI(               #3
        temperature=0, model="gpt-4"      #3
    ),    #4
)         #4
#1 从 LangChain 导入 LLM 连接器

#2 选择分层处理时,必须设置任务管理器。

#3 将任务管理器设置为 LLM 连接器

#4 选择分层处理时,必须设置任务管理器。

在 VS Code 中运行此文件(按 F5)。出现提示时,输入您想要创建的游戏。尝试使用您之前用 AutoGen 尝试过的同一个游戏;贪吃蛇游戏也是一个很好的基准示例。观察智能体如何运行代码,并反复检查代码以发现问题。

Run this file in VS Code (F5). When prompted, enter a game you want to create. Try using the same game you tried with AutoGen; the snake game is also a good baseline example. Observe the agents work through the code and review it repeatedly for problems.

运行文件后,您还可以登录 AgentOps 查看此次运行的成本。很可能,使用代理管理器后,成本会是不使用代理管理器时的两倍以上。而且,输出结果可能也不会有显著提升。这就是在不了解问题会如何迅速恶化的情况下构建代理系统的陷阱。

After you run the file, you can also jump on AgentOps to review the cost of this run. Chances are, it will cost over double what it would have without the agent manager. The output will also likely not be significantly better. This is the trap of building agent systems without understanding how quickly things can spiral.

这种螺旋式下降的典型例子是,当智能体不断重复执行相同的操作时,就会出现频繁重复的任务。您可以在 AgentOps 中查看此问题,如图 4.13 所示,方法是查看“重复思考”图。

An example of this spiral that often happens when agents continually iterate over the same actions is frequently repeating tasks. You can view this problem in AgentOps, as shown in figure 4.13, by viewing the Repeat Thoughts plot.

数字
图 4.13代理运行过程中思维的重复

AgentOps 中的“重复思维”图表是衡量智能体系统重复性的绝佳工具。过度重复的思维模式通常意味着智能体不够果断,而是不断尝试生成不同的答案。如果遇到此类问题,则需要更改智能体的处理模式、任务和目标。您甚至可能需要更改系统的智能体类型和数量。

The Repeat Thoughts plot from AgentOps is an excellent way to measure the repetition your agent system encounters. Overly repetitive thought patterns typically mean the agent isn’t being decisive enough and instead keeps trying to generate a different answer. If you encounter this problem, you want to change the agents’ processing patterns, tasks, and goals. You may even want to alter the system’s type and number of agents.

多智能体系统是按工作模式和任务划分工作的绝佳方式。通常,工作角色会分配给一个智能体角色/人物,而它需要完成的任务可能是隐式的(如 AutoGen),也可能是显式的(如 CrewAI)。

Multi-agent systems are an excellent way to break up work in terms of work patterns of jobs and tasks. Generally, the job role is allocated to an agent role/persona, and the tasks it needs to complete may be implicit, as in AutoGen, or more explicit, as in CrewAI.

本章介绍了许多实用工具和平台,您可以立即使用它们来改善工作、生活等等。至此,我们对多智能体平台的探索之旅告一段落,但这并不意味着我们对多智能体的探索和使用就此结束,我们将在后续章节中继续深入探讨。

In this chapter, we covered many useful tools and platforms that you can use right away to improve your work, life, and more. That completes our journey through multi-agent platforms, but it doesn’t conclude our exploration and use of multiple agents, as we’ll discover in later chapters.

4.6练习

4.6 Exercises

通过以下练习来加深你对相关内容的理解:

Use the following exercises to improve your knowledge of the material:

  • 练习 1  — 与 AutoGen 的基本代理通信
  • Exercise 1 —Basic Agent Communication with AutoGen

目标 ——熟悉 AutoGen 中的基本代理通信和设置。

Objective —Familiarize yourself with basic agent communications and setup in AutoGen.

任务

Tasks:

    • 按照本章提供的说明,在本地计算机上安装 AutoGen Studio。
    • 创建一个简单的多代理系统,包含一个用户代理和两个助手代理。
    • 实现一个基本任务,其中用户代理协调助手代理生成简单的文本输出,例如总结一个短段落。
    • Set up AutoGen Studio on your local machine, following the instructions provided in this chapter.
    • Create a simple multi-agent system with a user proxy and two assistant agents.
    • Implement a basic task where the user proxy coordinates between the assistant agents to generate a simple text output, such as summarizing a short paragraph.
  • 练习 2  — 在 AutoGen Studio 中实施高级代理技能
  • Exercise 2 —Implementing Advanced Agent Skills in AutoGen Studio

目标 ——通过增加高级技能来增强代理能力。

Objective —Enhance agent capabilities by adding advanced skills.

任务

Tasks:

    • 开发并集成一项新技能到 AutoGen 代理中,使其能够从公共 API 获取和显示实时数据(例如,天气信息或股票价格)。
    • 确保代理可以询问用户偏好(例如,天气所在的城市、股票类型),并据此显示获取的数据。
    • Develop and integrate a new skill into an AutoGen agent that allows it to fetch and display real-time data from a public API (e.g., weather information or stock prices).
    • Ensure the agent can ask for user preferences (e.g., city for weather, type of stocks) and display the fetched data accordingly.
  • 练习 3  — 使用 CrewAI 进行基于角色的任务管理
  • Exercise 3 —Role-Based Task Management with CrewAI

目标 ——探索 CrewAI 中基于角色的任务管理。

Objective —Explore role-based task management in CrewAI.

任务

Tasks:

    • 设计一个 CrewAI 设置,其中多个代理被分配特定角色(例如,数据获取者、分析者、展示者)。
    • 配置一个任务序列,其中数据获取器收集数据,分析器处理数据,呈现器生成报告。
    • 执行该序列并观察代理之间的信息流和任务分配。
    • Design a CrewAI setup where multiple agents are assigned specific roles (e.g., data fetcher, analyzer, presenter).
    • Configure a task sequence where the data fetcher collects data, the analyzer processes the data, and the presenter generates a report.
    • Execute the sequence and observe the flow of information and task delegation among agents.
  • 练习 4  — 使用 AutoGen 实现群聊中的多智能体协作
  • Exercise 4 —Multi-Agent Collaboration in Group Chat Using AutoGen

目标 ——了解并实施 AutoGen 中的群聊系统,以促进代理协作。

Objective —Understand and implement a group chat system in AutoGen to facilitate agent collaboration.

任务

Tasks:

    • 设置一个场景,其中多个代理人需要协作解决一个复杂的问题(例如,为商务旅行制定行程)。
    • 使用群聊功能,可以让客服人员互相分享信息、提出问题并发布最新动态。
    • 监测代理在协作解决问题中的互动和效率。
    • Set up a scenario where multiple agents need to collaborate to solve a complex problem (e.g., planning an itinerary for a business trip).
    • Use the group chat feature to allow agents to share information, ask questions, and provide updates to each other.
    • Monitor the agents’ interactions and effectiveness in collaborative problem solving.
  • 练习 5  — 在 CrewAI 中使用 AgentOps 添加和测试可观测性
  • Exercise 5 —Adding and Testing Observability with AgentOps in CrewAI

目标 ——在 CrewAI 环境中,使用 AgentOps 实现和评估代理的可观测性。

Objective —Implement and evaluate the observability of agents using AgentOps in a CrewAI environment.

任务

Tasks:

    • 将 AgentOps 集成到 CrewAI 多智能体系统中。
    • 为智能体设计一项涉及大量计算或数据处理的任务(例如,分析客户评论以确定情绪趋势)。
    • 使用 AgentOps 监控代理的性能、成本和输出准确性。识别代理交互中任何潜在的低效或错误。
    • Integrate AgentOps into a CrewAI multi-agent system.
    • Design a task for the agents that involves significant computation or data processing (e.g., analyzing customer reviews to determine sentiment trends).
    • Use AgentOps to monitor the performance, cost, and output accuracy of the agents. Identify any potential inefficiencies or errors in agent interactions.

概括

Summary

  • AutoGen 由微软开发,是一个对话式多代理平台,它采用各种类型的代理(例如用户代理和助手代理)来通过自然语言交互来促进任务执行。
  • AutoGen, developed by Microsoft, is a conversational multi-agent platform that employs a variety of agent types, such as user proxies and assistant agents, to facilitate task execution through natural language interactions.
  • AutoGen Studio 作为一个开发环境,允许用户创建、测试和管理多代理系统,从而增强 AutoGen 的可用性。
  • AutoGen Studio acts as a development environment that allows users to create, test, and manage multi-agent systems, enhancing the usability of AutoGen.
  • AutoGen 支持多种通信模式,包括群聊、层级式通信和代理通信。代理通信涉及一个主代理(代理),它作为用户与其他代理之间的接口,以简化任务完成流程。
  • AutoGen supports multiple communication patterns, including group chats and hierarchical and proxy communications. Proxy communication involves a primary agent (proxy) that interfaces between the user and other agents to streamline task completion.
  • CrewAI 提供了一种结构化的方法来构建多智能体系统,重点关注企业应用。它强调基于角色和自主的智能体功能,从而实现灵活的、顺序的或分层的任务管理。
  • CrewAI offers a structured approach to building multi-agent systems with a focus on enterprise applications. It emphasizes role-based and autonomous agent functionalities, allowing for flexible, sequential, or hierarchical task management.
  • 本章中的实践练习演示了如何设置和使用 AutoGen Studio,包括安装必要的组件和运行基本的多代理系统。
  • Practical exercises in the chapter illustrate how to set up and use AutoGen Studio, including installing necessary components and running basic multi-agent systems.
  • AutoGen 中的代理可以配备特定技能来执行代码生成、图像分析和数据检索等任务,从而扩大其应用范围。
  • Agents in AutoGen can be equipped with specific skills to perform tasks such as code generation, image analysis, and data retrieval, thereby broadening their application scope.
  • CrewAI 的特点是能够比 AutoGen 更严格地构建代理交互,这在需要精确和受控的代理行为的环境中可能具有优势。
  • CrewAI is distinguished by its ability to structure agent interactions more rigidly than AutoGen, which can be advantageous in settings that require precise and controlled agent behavior.
  • CrewAI 支持集成内存和工具,供代理在任务完成过程中使用。
  • CrewAI supports integrating memory and tools for agents to consume through task completion.
  • CrewAI 支持与 AgentOps 等可观测性工具集成,从而提供有关代理性能、交互效率和成本管理的见解。
  • CrewAI supports integration with observability tools such as AgentOps, which provides insights into agent performance, interaction efficiency, and cost management.
  • AgentOps 是一个代理可观测性平台,可以帮助您轻松监控大量的代理交互。
  • AgentOps is an agent observability platform that can help you easily monitor extensive agent interactions.

5. 赋予代理人行动权

5 Empowering agents with actions

本章内容

This chapter covers

  • 主体如何利用行为在自身之外采取行动
  • How an agent acts outside of itself using actions
  • 定义和使用 OpenAI 函数
  • Defining and using OpenAI functions
  • 语义内核以及如何使用语义函数
  • The Semantic Kernel and how to use semantic functions
  • 语义和原生功能的协同作用
  • Synergizing semantic and native functions
  • 使用语义内核实例化 GPT 接口
  • Instantiating a GPT interface with Semantic Kernel

本章我们将探讨如何通过函数实现动作,以及智能体如何使用这些函数。我们将首先了解 OpenAI 的函数调用,然后快速过渡到微软的另一个项目——语义内核 (Semantic Kernel,简称 SK)。我们将使用 SK 来构建和管理智能体的技能和函数,或者让智能体自身也具备这些技能和函数。

In this chapter, we explore actions through the use of functions and how agents can use them as well. We’ll start by looking at OpenAI function calling and then quickly move on to another project from Microsoft called Semantic Kernel (SK), which we’ll use to build and manage skills and functions for agents or as agents.

本章最后,我们将使用 SK 来托管我们的第一个代理系统。本章内容完整,包含大量带注释的代码示例。

We’ll finish the chapter using SK to host our first agent system. This will be a complete chapter with plenty of annotated code examples.

5.1定义代理行为

5.1 Defining agent actions

ChatGPT插件最初是为了给用户会话添加功能、技能或工具而推出的。借助插件,用户可以搜索网络、创建电子表格或图表。插件为ChatGPT提供了扩展平台功能的途径。

ChatGPT plugins were first introduced to provide a session with abilities, skills, or tools. With a plugin, you can search the web or create spreadsheets or graphs. Plugins provide ChatGPT with the means to extend the platform.

图 5.1 展示了 ChatGPT 插件的工作原理。在本例中,ChatGPT 中安装了一个新的电影推荐插件。当用户请求 ChatGPT 推荐一部新电影时,大型语言模型 (LLM) 会识别出它有一个插件来处理该操作。然后,它会将用户请求分解成可执行的参数,并将这些参数传递给新的电影推荐器。

Figure 5.1 shows how a ChatGPT plugin works. In this example, a new movie recommender plugin has been installed in ChatGPT. When a user asks ChatGPT to recommend a new movie, the large language model (LLM) recognizes that it has a plugin to manage that action. It then breaks down the user request into actionable parameters, which it passes to the new movie recommender.

数字
图 5.1 ChatGPT 插件的运行方式,以及插件和其他外部工具(例如 API)如何与“使用外部工具”提示工程策略保持一致

推荐器随后抓取展示新电影的网站,并将该信息附加到发送给 LLM 的新提示请求中。LLM 收到此信息后,会向推荐器返回响应,推荐器再将响应传递给 ChatGPT。ChatGPT 随后会向用户返回推荐请求。

The recommender then scrapes a website showcasing new movies and appends that information to a new prompt request to an LLM. With this information, the LLM responds to the recommender, which passes this back to ChatGPT. ChatGPT then responds to the user with the recommended request.

我们可以将插件视为操作的代理。一个插件通常封装了一项或多项功能,例如调用 API 或抓取网站数据。因此,操作是插件的扩展——它们赋予插件相应的功能。

We can think of plugins as proxies for actions. A plugin generally encapsulates one or more abilities, such as calling an API or scraping a website. Actions, therefore, are extensions of plugins—they give a plugin its abilities.

AI代理可以被视为插件以及插件、工具、技能和其他代理的消费者。向代理/插件添加技能、功能和工具,使其能够执行预定义的操作——图5.2突出显示了代理操作发生的位置及其与LLM和其他系统的交互。

AI agents can be considered plugins and consumers of plugins, tools, skills, and other agents. Adding skills, functions, and tools to an agent/plugin allows it to execute well-defined actions—figure 5.2 highlights where agent actions occur and their interaction with LLMs and other systems.

数字
图 5.2智能体如何使用动作来执行外部任务

智能体动作是指智能体使用某种功能、技能或工具的能力。容易让人困惑的是,不同的框架使用不同的术语。为了建立一些基本定义,我们将动作定义为智能体可以执行的任何操作。

An agent action is an ability that allows it to use a function, skill, or tool. What gets confusing is that different frameworks use different terminology. We’ll define an action as anything an agent can do to establish some basic definitions.

ChatGPT 插件和函数代表了 ChatGPT 或代理系统可以用来执行额外操作的可操作能力。现在让我们来了解一下 OpenAI 插件的基础和函数定义。

ChatGPT plugins and functions represent an actionable ability that ChatGPT or an agent system can use to perform additional actions. Now let’s examine the basis for OpenAI plugins and the function definition.

5.2执行 OpenAI 函数

5.2 Executing OpenAI functions

OpenAI 通过启用插件,引入了一种结构规范,用于定义 LLM 可以执行的功能/插件之间的接口。该规范正逐渐成为 LLM 系统构建可执行系统的标准。

OpenAI, with the enablement of plugins, introduced a structure specification for defining the interface between functions/plugins an LLM could action. This specification is becoming a standard that LLM systems can follow to provide actionable systems.

这些函数定义现在也被用于定义 ChatGPT 和其他系统的插件。接下来,我们将探讨如何直接通过 LLM 调用来使用这些函数。

These same function definitions are now also being used to define plugins for ChatGPT and other systems. Next, we’ll explore how to use functions directly with an LLM call.

5.2.1向 LLM API 调用添加函数

5.2.1 Adding functions to LLM API calls

图 5.3 展示了 LLM 如何识别和使用函数定义,将其响应转换为函数调用。

Figure 5.3 demonstrates how an LLM recognizes and uses the function definition to cast its response as the function call.

数字
图 5.3 LLM 如何解释单个 LLM 请求(包括工具)

清单 5.1 展示了使用工具和函数定义调用 LLM API 的详细信息。添加函数定义后,LLM 可以响应函数的输入参数。这意味着 LLM 将识别正确的函数并解析用户请求的相关参数。

Listing 5.1 shows the details of an LLM API call using tools and a function definition. Adding a function definition allows the LLM to reply regarding the function’s input parameters. This means the LLM will identify the correct function and parse the relevant parameters for the user’s request.

清单 5.1 first_function.py(API 调用)
response = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[{"role": "system",
                   "content": "你是一位乐于助人的助手。"
                  {"角色": "用户", "内容": 用户消息}],
        温度=0.7,
        工具=[    #1
            {
                "type": "function",    #2
                “功能”: {
                    "姓名": "推荐",
                    描述:请提供一个……主题。    #3
                    “参数”: {
                        "type": "object",    #4
                        “特性”: {
                            “话题”: {
                                "type": "string",
                                “描述”:
                                   “这个话题……是为了。”    #5
                            },
                            评分:{
                                "type": "string",
                                “描述”:
                          “评级……已给出。”,#5
                                "枚举": ["好",
                                         “坏的”,
                                         “糟糕的”]    #6
                                },
                        },
                        必填项:["主题"],
                    },
                },
                }
            ]
        )
response = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[{"role": "system",
                   "content": "You are a helpful assistant."},
                  {"role": "user", "content": user_message}],
        temperature=0.7,
        tools=[     #1
            {
                "type": "function",     #2
                "function": {
                    "name": "recommend",
                    "description": "Provide a … topic.",     #3
                    "parameters": {
                        "type": "object",     #4
                        "properties": {
                            "topic": {
                                "type": "string",
                                "description": 
                                   "The topic,… for.",     #5
                            },
                            "rating": {
                                "type": "string",
                                "description": 
                          "The rating … given.",    #5
                                "enum": ["good",
                                         "bad", 
                                         "terrible"]     #6
                                },
                        },
                        "required": ["topic"],
                    },
                },
                }
            ]
        )
#1 新增名为 tools 的参数

#2 设置工具类型为函数

#3 提供函数功能的详细描述

#4 定义输入参数的类型;对象代表 JSON 文档。

#5 为每个输入参数提供详细描述

#6 甚至可以使用枚举形式进行描述。

要了解其工作原理,请打开 Visual Studio Code (VS Code) 并进入本书的源代码文件夹。chapter_4/first_function.py建议在 VS Code 中打开相关章节文件夹,创建一个新的 Python 环境并安装该requirements.txt文件。如果您需要帮助,请参阅附录 B。

To see how this works, open Visual Studio Code (VS Code) to the book’s source code folder: chapter_4/first_function.py. It’s a good practice to open the relevant chapter folder in VS Code to create a new Python environment and install the requirements.txt file. If you need assistance with this, consult appendix B.

.env开始之前,请确保在文件夹中正确设置chapter_4包含 API 凭据的文件。函数调用是 LLM 商业服务提供的一项额外功能。截至撰写本文时,开源 LLM 部署尚不支持此功能。

Before starting, correctly set up an .env file in the chapter_4 folder with your API credentials. Function calling is an extra capability provided by the LLM commercial service. At the time of writing, this feature wasn’t an option for open source LLM deployments.

接下来,我们将查看代码底部,first_function.py,如清单 5.2 所示。这里仅列举了使用清单 5.1 中先前指定的请求向 LLM 发出的两个调用示例。每个请求都显示了运行示例后生成的输出。

Next, we’ll look at the bottom of the code in first_function.py, as shown in listing 5.2. Here are just two examples of calls made to an LLM using the request previously specified in listing 5.1. Here, each request shows the generated output from running the example.

清单 5.2 first_function.py(API 操作)
用户:“请问您能推荐一部穿越时空的电影吗?”
response = ask_chatgpt(user)    #1
print(响应)

###输出
函数(参数='{"topic":"时间旅行电影"}',
                      名称='推荐')    #2

用户:“请问您能推荐一部好看的穿越时空电影吗?”
response = ask_chatgpt(user)    #3
print(响应)

###输出
函数(参数='{"topic":"时间旅行电影",
                     "rating":"好"}',
 名称='推荐')    #4
user = "Can you please recommend me a time travel movie?"
response = ask_chatgpt(user)     #1
print(response)

###Output
Function(arguments='{"topic":"time travel movie"}', 
                      name='recommend')     #2

user = "Can you please recommend me a good time travel movie?"
response = ask_chatgpt(user)     #3
print(response)

###Output
Function(arguments='{"topic":"time travel movie",
                     "rating":"good"}',
 name='recommend')     #4
#1 先前定义的函数

#2 返回要调用的函数名和提取的输入参数

#3 先前定义的函数

#4 返回要调用的函数名和提取的输入参数

在 VS Code 中使用调试器 (F5) 或终端运行first_function.pyPython 脚本,即可看到相同的结果。这里,LLM 会解析输入请求,以匹配任何已注册的工具。在本例中,该工具是单个函数定义,即推荐函数。LLM 从该函数中提取输入参数,并解析请求中的这些参数。然后,它会返回指定的函数及其指定的输入参数。

Run the first_function.py Python script in VS Code using the debugger (F5) or the terminal to see the same results. Here, the LLM parses the input request to match any registered tools. In this case, the tool is the single function definition, that is, the recommended function. The LLM extracts the input parameters from this function and parses those from the request. Then, it replies with the named function and designated input parameters.

5.2.2执行函数调用

5.2.2 Actioning function calls

既然我们已经了解了LLM不会直接执行函数或插件,那么我们可以来看一个执行工具的示例。为了继续围绕推荐系统展开,我们将看另一个添加Python函数以实现简单推荐的示例。

Now that we understand that an LLM doesn’t execute the function or plugin directly, we can look at an example that executes the tools. Keeping with the recommender theme, we’ll look at another example that adds a Python function for simple recommendations.

图 5.4 展示了这个简单示例的工作原理。我们将提交一个包含工具功能定义的请求,请求三个建议。LLM 会回复三个带有输入参数(时间旅行、食谱和礼物)的函数调用。函数执行结果随后会返回给 LLM,LLM 会将结果转换回自然语言并返回回复。

Figure 5.4 shows how this simple example will work. We’ll submit a single request that includes a tool function definition, asking for three recommendations. The LLM, in turn, will reply with the three function calls with input parameters (time travel, recipe, and gift). The results from executing the functions are then passed back to the LLM, which converts them back to natural language and returns a reply.

数字
图 5.4示例请求返回三个工具函数调用,然后将结果提交回 LLM 以返回自然语言响应。

现在我们已经理解了这个示例,请parallel_functions.py在 VS Code 中打开它。清单 5.3 显示了您要调用以提供推荐的 Python 函数。

Now that we understand the example, open parallel_functions.py in VS Code. Listing 5.3 shows the Python function that you want to call to give recommendations.

清单 5.3 parallel_functions.py(推荐功能)
def recommend(topic, rating="good"):
    如果 topic.lower() 中包含“时间旅行”:    #1
        返回 json.dumps({"topic": "时间旅行",
                           推荐影片:《回到未来》
                           "rating": rating})
    elif "recipe" in topic.lower(): #1
        return json.dumps({"topic": "recipe",
                           推荐语:“吃过的最好的东西……”
                           "rating": rating})
    elif "gift" in topic.lower(): #1
        返回 json.dumps({"topic": "gift",
                           “推荐”:“一部精彩的新……”
                           "rating": rating})
    别的:    #2
        返回 json.dumps({"topic": topic,
                           “建议”: “未知”})    #3
def recommend(topic, rating="good"):
    if "time travel" in topic.lower():     #1
        return json.dumps({"topic": "time travel",
                           "recommendation": "Back to the Future",
                           "rating": rating})
    elif "recipe" in topic.lower():    #1
        return json.dumps({"topic": "recipe",
                           "recommendation": "The best thing … ate.",
                           "rating": rating})
    elif "gift" in topic.lower():      #1
        return json.dumps({"topic": "gift",
                           "recommendation": "A glorious new...",
                           "rating": rating})
    else:     #2
        return json.dumps({"topic": topic,
                           "recommendation": "unknown"})     #3
#1 检查字符串是否包含在主题输入中

#2 如果未检测到主题,则返回默认值

#3 返回 JSON 对象

接下来,我们将研究名为 的函数run_conversation,其中所有工作都从请求构造开始。

Next, we’ll look at the function called run_conversation, where all the work starts with the request construction.

清单 5.4 parallel_functions.pyrun_conversationrequest
用户 = """请问您能否就以下方面提出建议:"
1. 穿越时空电影
2. 食谱
3. 礼物”    #1
messages = [{"role": "user", "content": user}]    #2
工具 = [    #3
    {
        "type": "function",
        “功能”: {
            "姓名": "推荐",
            “描述”:
                请就任何主题提出建议。
            “参数”: {
                "type": "object",
                “特性”: {
                    “话题”: {
                        "type": "string",
                        “描述”:
                              “关于这个话题,……建议是……”
                        },
                        评分:{
                            "type": "string",
                            描述:评级……已给出。
                            枚举:["好", "坏", "糟糕"]
                            },
                        },
                必填项:["主题"],
                },
            },
        }
    ]
user = """Can you please make recommendations for the following:
1. Time travel movies
2. Recipes
3. Gifts"""     #1
messages = [{"role": "user", "content": user}]     #2
tools = [     #3
    {
        "type": "function",
        "function": {
            "name": "recommend",
            "description": 
                "Provide a recommendation for any topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "topic": {
                        "type": "string",
                        "description": 
                              "The topic, … recommendation for.",
                        },
                        "rating": {
                            "type": "string",
                            "description": "The rating … was given.",
                            "enum": ["good", "bad", "terrible"]
                            },
                        },
                "required": ["topic"],
                },
            },
        }
    ]
#1 用户消息请求三个推荐。

#2 注意:没有系统消息。

#3 将函数定义添加到请求的工具部分。

清单 5.5 展示了请求的发出过程,我们之前已经介绍过,但有几点需要注意。这次调用使用了较低级别的模型,例如 GPT-3.5,因为委托函数是一项更直接的任务,可以使用较旧、成本较低、功能较少的语言模型来完成。

Listing 5.5 shows the request being made, which we’ve covered before, but there are a few things to note. This call uses a lower model such as GPT-3.5 because delegating functions is a more straightforward task and can be done using older, cheaper, less sophisticated language models.

清单 5.5 parallel_functions.pyrun_conversation,API 调用)
response = client.chat.completions.create(
    型号="gpt-3.5-turbo-1106",    #1
    messages=messages,    #2
    工具=工具,#2
    tool_choice="auto",  #3
)
response_message = response.choices[0].message    #4
response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",     #1
    messages=messages,     #2
    tools=tools,     #2
    tool_choice="auto",   #3
)
response_message = response.choices[0].message     #4
#1 将任务委托给函数的 LLM 可以采用更简单的模型。

#2 添加消息和工具定义。

#3 默认值为 auto。

#4 LLM 返回的消息。

此时,API 调用完成后,响应应该包含所需函数调用的信息。请记住,我们要求 LLM 提供三条建议,这意味着它也应该提供三个函数调用输出,如下面的列表所示。

At this point, after the API call, the response should hold the information for the required function calls. Remember, we asked the LLM to provide us with three recommendations, which means it should also provide us with three function call outputs, as shown in the following listing.

列表 5.6 parallel_functions.pyrun_conversationtool_calls
tool_calls = response_message.tool_calls    #1
如果 tool_calls:#1
    可用函数 = {
        “推荐”:推荐,
    }    #2
    # 步骤 4:将每次函数调用和函数响应的信息发送到
模型
    对于 tool_calls 中的每个 tool_call:    #3
        函数名 = tool_call.function.name
        要调用的函数 = 可用函数[函数名]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            topic=function_args.get("topic"),    #4
            rating=function_args.get("rating"),
        )
        messages.append(    #5
            {
                "tool_call_id": tool_call.id,
                "角色": "工具",
                "name": function_name,
                "content": function_response,
            }
        ) # 使用函数响应扩展对话
    second_response = client.chat.completions.create(    #6
        型号="gpt-3.5-turbo-1106",
        messages=messages,
    )
    返回 second_response.choices[0].message.content #6
tool_calls = response_message.tool_calls     #1
if tool_calls:    #1
    available_functions = {
        "recommend": recommend,
    }     #2
    # Step 4: send the info for each function call and function response to 
the model
    for tool_call in tool_calls:     #3
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            topic=function_args.get("topic"),     #4
            rating=function_args.get("rating"),
        )
        messages.append(     #5
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response
    second_response = client.chat.completions.create(     #6
        model="gpt-3.5-turbo-1106",
        messages=messages,
    )
    return second_response.choices[0].message.content  #6
#1 如果响应包含工具调用,则执行它们。

#2 仅包含一个函数,但可能包含多个

函数。#3 循环遍历这些调用,并将内容回放给 LLM。

#4 根据提取的参数执行推荐函数。

#5 将每次函数调用的结果追加到消息集中。

#6 向 LLM 发送包含更新信息的另一个请求,并返回消息回复。

工具调用输出和推荐函数调用结果会被附加到消息中。请注意,消息现在还包含首次调用的历史记录。然后,这些信息会被传递回语言学习模型(LLM),以构建自然语言回复。

The tool call outputs and the calls to the recommender function results are appended to the messages. Notice how messages now also contain the history of the first call. This is then passed back to the LLM to construct a reply in natural language.

在 VS Code 中打开此文件并按 F5 键即可调试此示例。以下列表显示了运行后的输出parallel_functions.py

Debug this example in VS Code by pressing the F5 key with the file open. The following listing shows the output of running parallel_functions.py.

清单 5.7 parallel_functions.py(输出)
以下是一些建议:

1. 穿越时空电影:《回到未来》
2. 食谱:“你吃过的最好吃的东西。”
3. 礼物:“一份崭新的……”(推荐语被截断了,所以我
(无法提供完整的建议)

希望这些建议对您有所帮助!如果您需要任何帮助,请告诉我。
更多信息。
Here are some recommendations for you:

1. Time travel movies: "Back to the Future"
2. Recipes: "The best thing you ever ate."
3. Gifts: "A glorious new..." (the recommendation was cut off, so I 
couldn't provide the full recommendation)

I hope you find these recommendations helpful! Let me know if you need 
more information.

至此,这个简单的演示就完成了。对于更高级的应用,这些函数可以执行各种各样的操作,从抓取网站内容到调用搜索引擎,再到完成更复杂的任务。

This completes this simple demonstration. For more advanced applications, the functions could do any number of things, from scraping websites to calling search engines to completing far more complex tasks.

函数是为特定任务生成输出的绝佳方式。然而,处理函数或工具以及进行二次调用的工作可以采用更简洁、更高效的方式。下一节将介绍一种更强大的向代理添加动作的系统。

Functions are an excellent way to cast outputs for a particular task. However, the work of handling functions or tools and making secondary calls can be done in a cleaner and more efficient way. The following section will uncover a more robust system of adding actions to agents.

5.3语义核的引入

5.3 Introducing Semantic Kernel

语义内核 (Semantic Kernel,简称 SK) 是微软的另一个开源项目,旨在帮助构建人工智能应用程序,我们称之为智能体。该项目的核心功能是定义动作,或者平台称之为语义插件,这些插件是对技能和功能的封装。

Semantic Kernel (SK) is another open source project from Microsoft intended to help build AI applications, which we call agents. At its core, the project is best used to define actions, or what the platform calls semantic plugins, which are wrappers for skills and functions.

图 5.5 展示了如何将 SK 用作 OpenAI 插件的插件和使用者。SK 依赖于 OpenAI 插件定义来定义插件。这样,它就可以使用自身或其他插件,并将其发布到其他系统。

Figure 5.5 shows how the SK can be used as a plugin and a consumer of OpenAI plugins. The SK relies on the OpenAI plugin definition to define a plugin. That way, it can consume and publish itself or other plugins to other systems.

数字
图 5.5语义内核如何作为插件集成以及如何使用插件

OpenAI 插件定义与清单 5.4 中的函数定义完全对应。这意味着 SK 是 API 工具调用(即插件)的协调器。这也意味着 SK 可以帮助组织多个带有聊天界面或代理的插件。

An OpenAI plugin definition maps precisely to the function definitions in listing 5.4. This means that SK is the orchestrator of API tool calls, aka plugins. That also means that SK can help organize multiple plugins with a chat interface or an agent.

SK 是一个用于管理多个插件(代理操作)的实用工具,而且正如我们稍后将看到的,它还可以辅助内存管理和规划工具。本章我们将重点介绍操作/插件。下一节,我们将探讨如何开始使用 SK。

SK is a useful tool for managing multiple plugins (actions for agents) and, as we’ll see later, can also assist with memory and planning tools. For this chapter, we’ll focus on the actions/plugins. In the next section, we look at how to get started using SK.

5.3.1 SK语义功能入门

5.3.1 Getting started with SK semantic functions

SK 易于安装,并可在 Python、Java 和 C# 中运行。这无疑是个好消息,因为它允许用一种语言开发的插件在另一种语言中使用。但是,目前还无法用一种语言开发原生函数并在另一种语言中使用它。

SK is easy to install and works within Python, Java, and C#. This is excellent news as it also allows plugins developed in one language to be consumed in a different language. However, you can’t yet develop a native function in one language and use it in another.

我们将继续在 VS Code 的工作区中搭建 Python 环境chapter_4。如果您想浏览和运行任何示例,请确保已配置好工作区。

We’ll continue from where we left off for the Python environment using the chapter_4 workspace in VS Code. Be sure you have a workspace configured if you want to explore and run any examples.

清单 5.8 展示了如何在 VS Code 的终端中安装 SK。您也可以安装 SK 的 VS Code 扩展。该扩展可以作为创建插件/技能的实用工具,但并非必需。

Listing 5.8 shows how to install SK from a terminal within VS Code. You can also install the SK extension for VS Code. The extension can be a helpful tool to create plugins/skills, but it isn’t required.

清单 5.8安装语义内核
pip uninstall semantic-kernel    #1

git clone https://github.com/microsoft/semantic-kernel.git    #2

cd semantic-kernel/python    #3

pip install -e .    #4
pip uninstall semantic-kernel     #1

git clone https://github.com/microsoft/semantic-kernel.git     #2

cd semantic-kernel/python     #3

pip install -e .     #4
#1 卸载之前安装的 SK

#2 将存储库克隆到本地文件夹

#3 修改源文件夹

#4 从源文件夹安装可编辑软件包

安装完成后,SK_connecting.py在 VS Code 中打开。清单 5.9 展示了如何通过 SK 快速运行示例。该示例使用 OpenAI 或 Azure OpenAI 创建聊天自动完成服务。

Once you finish the installation, open SK_connecting.py in VS Code. Listing 5.9 shows a demo of running an example quickly through SK. The example creates a chat completion service using either OpenAI or Azure OpenAI.

列表 5.9 SK_connecting.py
导入 semantic_kernel as sk

selected_service = "OpenAI"    #1
kernel = sk.Kernel()    #2

service_id = None
如果 selected_service == "OpenAI":
    from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion

    api_key, org_id = sk.openai_settings_from_dot_env()    #3
    service_id = "oai_chat_gpt"
    kernel.add_service(
        OpenAIChatCompletion(
            service_id=service_id,
            ai_model_id =“gpt-3.5-turbo-1106”,
            api_key=api_key,
            org_id=org_id,
        ),
    )
elif selected_service == "AzureOpenAI":
    from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

    部署、api_key、端点 =
 sk.azure_openai_settings_from_dot_env()  #4
    service_id = "aoai_chat_completion"
    kernel.add_service(
        AzureChatCompletion(
            service_id=service_id,
            deployment_name=部署,
            endpoint=endpoint,
            api_key=api_key,
        ),
    )

#此功能目前已损坏
async def run_prompt():
    result = await kernel.invoke_prompt(
              ↪ prompt="推荐一部关于时间旅行的电影 
")    #5
    打印(结果)

# 使用 asyncio.run 执行异步函数
asyncio.run(run_prompt())    #6

###输出
强烈推荐一部穿越时空的电影:《回到未来》(1985)。
由罗伯特·泽米吉斯执导。这部经典影片讲述了……的冒险故事。
少年马蒂·麦克弗莱(迈克尔·J·福克斯饰)……
import semantic_kernel as sk

selected_service = "OpenAI"     #1
kernel = sk.Kernel()     #2

service_id = None
if selected_service == "OpenAI":
    from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion

    api_key, org_id = sk.openai_settings_from_dot_env()     #3
    service_id = "oai_chat_gpt"
    kernel.add_service(
        OpenAIChatCompletion(
            service_id=service_id,
            ai_model_id="gpt-3.5-turbo-1106",
            api_key=api_key,
            org_id=org_id,
        ),
    )
elif selected_service == "AzureOpenAI":
    from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

    deployment, api_key, endpoint = 
 sk.azure_openai_settings_from_dot_env()   #4
    service_id = "aoai_chat_completion"
    kernel.add_service(
        AzureChatCompletion(
            service_id=service_id,
            deployment_name=deployment,
            endpoint=endpoint,
            api_key=api_key,
        ),
    )

#This function is currently broken
async def run_prompt():
    result = await kernel.invoke_prompt( 
               prompt="recommend a movie about 
 time travel")     #5
    print(result)

# Use asyncio.run to execute the async function
asyncio.run(run_prompt())     #6

###Output
One highly recommended time travel movie is "Back to the Future" (1985) 
directed by Robert Zemeckis. This classic film follows the adventures of 
teenager Marty McFly (Michael J. Fox)…
#1 设置您正在使用的服务(OpenAI 或 Azure OpenAI)

#2 创建内核

#3 从 .env 文件加载密钥并将其设置到聊天服务

#4 从 .env 文件加载密钥并将其设置到聊天服务

#5 调用提示符

#6 异步调用函数

按 F5(调试)运行示例,您应该会看到类似于清单 5.9 的输出。此示例演示了如何使用 SK 创建并执行语义函数。语义函数相当于提示流(另一个 Microsoft 工具)中的提示模板。在本示例中,我们将一个简单的提示定义为一个函数。

Run the example by pressing F5 (debugging), and you should see an output similar to listing 5.9. This example demonstrates how a semantic function can be created with SK and executed. A semantic function is the equivalent of a prompt template in prompt flow, another Microsoft tool. In this example, we define a simple prompt as a function.

需要注意的是,这个语义函数并没有被定义为插件。但是,内核可以将其创建为一个独立的语义元素,并针对语言逻辑模型(LLM)执行。语义函数可以单独使用,也可以注册为插件,这一点稍后会详细介绍。接下来,我们进入下一节,介绍上下文变量。

It’s important to note that this semantic function isn’t defined as a plugin. However, the kernel can create the function as a self-contained semantic element that can be executed against an LLM. Semantic functions can be used alone or registered as plugins, as you’ll see later. Let’s jump to the next section, where we introduce contextual variables.

5.3.2语​​义功能和上下文变量

5.3.2 Semantic functions and context variables

在前一个例子的基础上,我们可以探讨如何向语义函数添加上下文变量。这种向提示模板添加占位符的模式,我们会反复使用。在这个例子中,我们来看一个包含主题、类型、格式和自定义占位符的提示模板。

Expanding on the previous example, we can look at adding contextual variables to the semantic function. This pattern of adding placeholders to prompt templates is one we’ll review over and over. In this example, we look at a prompt template that has placeholders for subject, genre, format, and custom.

在 VS Code 中打开SK_context_variables.py,如下一个示例所示。该提示相当于在提示符中留出system一部分空间。user

Open SK_context_variables.py in VS Code, as shown in the next listing. The prompt is equivalent to setting aside a system and user section of the prompt.

清单 5.10 SK_context_variables.py
顶部部分已省略……
提示 = """    #1
系统:

你知识渊博,对任何事物都能提出建议。
您将获得以下标准:主题、类型、格式以及任何其他要求。
其他自定义信息。

用户:
请推荐一个主题为 {{$subject}}、类型为 {{$genre}} 的 {{$format}} 文件。
请添加以下自定义信息:{{$custom}}
"""

prompt_template_config = sk.PromptTemplateConfig(    #2
    template=提示,
    名称="tldr",
    template_format="语义内核",
    输入变量=[
        输入变量(
            名称="格式",
            描述“推荐格式”
            is_required=True
        ),
        输入变量(
            姓名="主题",
            描述“推荐主题”
            is_required=True
        ),
        输入变量(
            名称="类型",
            描述“值得推荐的类型”
            is_required=True
        ),
        输入变量(
            名称="custom",
            描述="任何自定义信息 [CA]
                       为了加强推荐意见”,
            is_required=True,
        ),
    ],
    执行设置=执行设置,
)

推荐函数 = kernel.create_function_from_prompt(    #3
    prompt_template_config=prompt_template_config,
    function_name="推荐电影",
    plugin_name="推荐",
)

async def run_recommendation(    #4
    主题“时间旅行”,
    format="movie",
    类型="中世纪",
           自定义“必须是喜剧”
):
    建议 = await kernel.invoke(
        推荐函数
        sk.KernelArguments(subject=subject,
                      格式=格式,
                      类型=类型,
                      custom=custom),    #5
    )
    打印(建议)


# 使用 asyncio.run 执行异步函数
asyncio.run(run_recommendation()) #5

###输出
一部符合以下条件的电影:关于时间旅行,背景设定在……
这部名为《访客》(Les Visiteurs)的喜剧,故事背景设定在中世纪。
这部1993年的法国电影由让-马里·波雷执导,讲述了……
一位骑士和他的侍从被传送到了现代……
巫师的咒语出错了……
#top section omitted…
prompt = """     #1
system:

You have vast knowledge of everything and can recommend anything provided 
you are given the following criteria, the subject, genre, format and any 
other custom information.

user:
Please recommend a {{$format}} with the subject {{$subject}} and {{$genre}}.
Include the following custom information: {{$custom}}
"""

prompt_template_config = sk.PromptTemplateConfig(     #2
    template=prompt,
    name="tldr",
    template_format="semantic-kernel",
    input_variables=[
        InputVariable(
            name="format", 
            description="The format to recommend", 
            is_required=True
        ),
        InputVariable(
            name="suject", 
            description="The subject to recommend", 
            is_required=True
        ),
        InputVariable(
            name="genre", 
            description="The genre to recommend", 
            is_required=True
        ),
        InputVariable(
            name="custom",
            description="Any custom information [CA]
                       to enhance the recommendation",
            is_required=True,
        ),
    ],
    execution_settings=execution_settings,
)

recommend_function = kernel.create_function_from_prompt(     #3
    prompt_template_config=prompt_template_config,
    function_name="Recommend_Movies",
    plugin_name="Recommendation",
)

async def run_recommendation(     #4
    subject="time travel",
    format="movie", 
    genre="medieval", 
           custom="must be a comedy"
):
    recommendation = await kernel.invoke(
        recommend_function,
        sk.KernelArguments(subject=subject,
                      format=format, 
                      genre=genre, 
                      custom=custom),     #5
    )
    print(recommendation)


# Use asyncio.run to execute the async function
asyncio.run(run_recommendation())    #5

###Output
One movie that fits the criteria of being about time travel, set in a 
medieval period, and being a comedy is "The Visitors" (Les Visiteurs) 
from 1993. This French film, directed by Jean-Marie Poiré, follows a 
knight and his squire who are transported to the modern era by a 
wizard’s spell gone wrong.…
#1 定义带占位符的提示符

#2 配置提示符模板和输入变量定义

#3 从提示符创建内核函数

#4 创建一个异步函数来包装函数调用

#5 设置内核函数参数

请继续调试此示例(按 F5),并等待输出结果生成。这是设置 SK 以及创建和使用语义功能的基础。下一节,我们将继续了解如何将语义功能注册为技能/插件。

Go ahead and debug this example (F5), and wait for the output to be generated. That is the basis for setting up SK and creating and exercising semantic functions. In the next section, we move on to see how a semantic function can be registered as a skill/plugin.

5.4语义功能与原生功能的协同作用

5.4 Synergizing semantic and native functions

语义函数封装了提示/配置文件,并通过与 LLM 的交互来执行。原生函数封装了代码,可以执行从网页抓取到网络搜索等各种操作。语义函数和原生函数都可以注册为 SK 内核中的插件/技能。

Semantic functions encapsulate a prompt/profile and execute through interaction with an LLM. Native functions are the encapsulation of code that may perform anything from scraping websites to searching the web. Both semantic and native functions can register as plugins/skills in the SK kernel.

语义函数或原生函数都可以注册为插件,其使用方式与之前直接通过 API 调用注册函数的方式相同。函数注册为插件后,即可供聊天或代理界面访问,具体取决于使用场景。下一节将介绍如何创建语义函数并将其注册到内核中。

A function, semantic or native, can be registered as a plugin and used the same way we registered the earlier function directly with our API calls. When a function is registered as a plugin, it becomes accessible to chat or agent interfaces, depending on the use case. The next section looks at how a semantic function is created and registered with the kernel.

5.4.1创建和注册语义技能/插件

5.4.1 Creating and registering a semantic skill/plugin

适用于 SK 的 VS Code 扩展提供了创建插件/技能的实用工具。在本节中,我们将使用 SK 扩展创建一个插件/技能,然后编辑该扩展的组件。之后,我们将在 SK 中注册并运行该插件。

The VS Code extension for SK provides helpful tools for creating plugins/skills. In this section, we’ll use the SK extension to create a plugin/skill and then edit the components of that extension. After that, we’ll register and execute the plugin in the SK.

图 5.6 展示了使用 SK 扩展在 VS Code 中创建新技能的过程。(如果需要安装此扩展,请参阅附录 B 获取安装说明。)接下来,您可以选择技能/插件文件夹来放置函数。务必将相似的函数分组在一起。创建技能后,输入您要开发的函数的名称和描述。请务必按照 LLM(法学硕士)使用该函数的方式来描述它。

Figure 5.6 shows the process for creating a new skill within VS Code using the SK extension. (Refer to appendix B for directions if you need to install this extension.) You’ll then be given the option for the skill/plugin folder to place the function. Always group functions that are similar together. After creating a skill, enter the name and description of the function you want to develop. Be sure to describe the function as if the LLM were going to use it.

数字
图 5.6创建新技能/插件的过程

skills/plugin打开文件夹并查看文件,即可看到已完成的技能和功能。我们将沿用之前构建的示例,因此请打开skills/Recommender/Recommend_Movies文件夹,如图 5.7 所示。该文件夹内包含一个config.json文件、功能描述以及一个名为 . 的语义功能/提示文件skprompt.txt

You can see the completed skills and functions by opening the skills/plugin folder and reviewing the files. We’ll follow the previously constructed example, so open the skills/Recommender/Recommend_Movies folder, as shown in figure 5.7. Inside this folder is a config.json file, the function description, and the semantic function/prompt in a file called skprompt.txt.

数字
图 5.7语义功能技能/插件的文件和文件夹结构

清单 5.11 展示了语义函数定义(也称为插件定义)的内容。请注意,由于这是一个语义函数,其类型被标记为 ` completion<type>` 而不是 `<type>` function。原生函数则被定义为 `<type>` 函数。

Listing 5.11 shows the contents of the semantic function definition, also known as the plugin definition. Note that the type is marked as completion and not of type function because this is a semantic function. We would define a native function as a type function.

清单 5.11 Recommend_Movies/config.json
{
    “方案”:1,
    "type": "完成",    #1
    描述:一个根据用户列表推荐电影的功能
以前看过的电影。
    "完成": {    #2
        "max_tokens": 256,
        “温度”:0,
        "top_p": 0,
        "presence_penalty": 0,
        "frequency_penalty": 0
    },
    “输入”: {
        “参数”: [
            {
                "名称": "输入",    #3
                描述:用户之前观看过的电影列表。
                "defaultValue": ""
            }
        ]
    },
    "default_backends": []
}
{
    "schema": 1,
    "type": "completion",     #1
    "description": "A function to recommend movies based on users list of 
previously seen movies.",
    "completion": {     #2
        "max_tokens": 256,
        "temperature": 0,
        "top_p": 0,
        "presence_penalty": 0,
        "frequency_penalty": 0
    },
    "input": {
        "parameters": [
            {
                "name": "input",     #3
                "description": "The users list of previously seen movies.",
                "defaultValue": ""
            }
        ]
    },
    "default_backends": []
}
#1 语义函数是类型补全函数。

#2 我们还可以设置函数调用的补全参数。

#3 定义语义函数的输入参数。

接下来,我们可以看一下语义功能提示的定义,如清单 5.12 所示。格式略有不同,但与之前使用模板的示例一致。此提示会根据用户之前观看过的电影列表推荐电影。

Next, we can look at the definition of the semantic function prompt, as shown in listing 5.12. The format is a little different, but what we see here matches the earlier examples using templating. This prompt recommends movies based on a list of movies the user has previously seen.

清单 5.12 Recommend_Movies/skprompt.txt
你是一位眼光独到的电影推荐人,有人请你推荐一部电影。
将电影发送给用户。
系统会提供用户之前观看过的电影列表。
你想推荐一部用户之前没看过的电影。
[输入]
{{$input}}
[输入结束]
You are a wise movie recommender and you have been asked to recommend a 
movie to a user.
You are provided a list of movies that the user has watched before.
You want to recommend a movie that the user has not watched before.
[INPUT]
{{$input}}
[END INPUT]

现在,我们将深入探讨加载技能/插件并通过一个简单的示例来执行它的代码。请SK_first_skill.py在 VS Code 中打开该文件。以下代码清单显示了一个简化版本,其中突出显示了新增部分。

Now, we’ll dive into the code that loads the skill/plugin and executes it in a simple example. Open the SK_first_skill.py file in VS Code. The following listing shows an abridged version highlighting the new sections.

清单 5.13 SK_first_skill.py(简略清单)
kernel = sk.Kernel()

plugins_directory = "plugins"

推荐器 = kernel.import_plugin_from_prompt_directory(
    plugins_directory,
    “推荐人”,
)    #1

推荐 = 推荐器["推荐电影"]

seen_movie_list = [    #2
    《回到未来》
    《终结者》
    《十二猴子》
    “环形使者”
    《土拨鼠日》
    “引言”,
    《唐尼·达科》
    “星际穿越”,
    《时光大盗》
    《奇异博士》
]


async def run():
    result = await kernel.invoke(
        推荐,
        sk.KernelArguments(    #3
            settings=execution_settings, input=", ".join(seen_movie_list)
        ),
    )
    打印(结果)


asyncio.run(run())    #4

###输出
根据你提供的电影列表,你似乎有
对科幻小说、时间旅行和烧脑故事情节感兴趣。
鉴于你已经观看过经典电影和现代电影的混合体
如果你喜欢这种类型的电影,我推荐你看这部还没看过的电影。
前:

《明日边缘》(又名《生死轮回:明日边缘》)……
kernel = sk.Kernel()

plugins_directory = "plugins"

recommender = kernel.import_plugin_from_prompt_directory(
    plugins_directory,
    "Recommender",
)     #1

recommend = recommender["Recommend_Movies"]

seen_movie_list = [     #2
    "Back to the Future",
    "The Terminator",
    "12 Monkeys",
    "Looper",
    "Groundhog Day",
    "Primer",
    "Donnie Darko",
    "Interstellar",
    "Time Bandits",
    "Doctor Strange",
]


async def run():
    result = await kernel.invoke(
        recommend,
        sk.KernelArguments(     #3
            settings=execution_settings, input=", ".join(seen_movie_list)
        ),
    )
    print(result)


asyncio.run(run())     #4

###Output
Based on the list of movies you've provided, it seems you have an 
interest in science fiction, time travel, and mind-bending narratives. 
Given that you've watched a mix of classics and modern films in this 
genre, I would recommend the following movie that you have not watched 
before:

"Edge of Tomorrow" (also known as "Live Die Repeat: Edge of Tomorrow")…
#1 从插件文件夹加载提示符

#2 用户之前观看过的电影列表

#3 输入框设置为已观看电影的列表

#4 函数异步执行。

代码会从skills目录和plugin文件夹加载技能/插件。当技能被加载到内核中(而不仅仅是创建)时,它就成为了一个已注册的插件。这意味着它可以像这里一样直接执行,也可以通过插件接口在LLM聊天对话中执行。

The code loads the skill/plugin from the skills directory and the plugin folder. When a skill is loaded into the kernel and not just created, it becomes a registered plugin. That means it can be executed directly as is done here or through an LLM chat conversation via the plugin interface.

运行代码(F5),您应该会看到类似清单 5.13 的输出。现在我们有了一个简单的语义函数,可以作为插件部署。但是,这个函数需要用户输入他们看过的所有电影的完整列表。我们将在下一节中探讨如何通过引入原生函数来解决这个问题。

Run the code (F5), and you should see an output like listing 5.13. We now have a simple semantic function that can be hosted as a plugin. However, this function requires users to input a complete list of movies they have watched. We’ll look at a means to fix this by introducing native functions in the next section.

5.4.2应用原生函数

5.4.2 Applying native functions

如前所述,原生函数是能够执行任何操作的代码。在下面的示例中,我们将引入一个原生函数来辅助我们之前构建的语义函数。

As stated, native functions are code that can do anything. In the following example, we’ll introduce a native function to assist the semantic function we built earlier.

这个原生函数会从文件中加载用户之前观看过的电影列表。虽然这个函数引入了内存的概念,但我们将在第八章讨论这个问题。你可以把这个新的原生函数看作是几乎可以执行任何操作的代码。

This native function will load a list of movies the user has previously seen, from a file. While this function introduces the concept of memory, we’ll defer that discussion until chapter 8. Consider this new native function as any code that could virtually do anything.

可以使用 SK 扩展创建和注册原生函数。在本示例中,我们将直接在代码中创建一个原生函数,以便于理解。

Native functions can be created and registered using the SK extension. For this example, we’ll create a native function directly in code to make the example easier to follow.

在 VS Code 中打开SK_native_functions.py。我们首先来看一下原生函数是如何定义的。原生函数通常在类中定义,这样可以简化原生函数的管理和实例化。

Open SK_native_functions.py in VS Code. We’ll start by looking at how the native function is defined. A native function is typically defined within a class, which simplifies managing and instantiating native functions.

清单 5.14 SK_native_functions.py ( MySeenMovieDatabase)
类 MySeenMoviesDatabase:
    """
    描述:管理用户已观看电影的列表。    #1
    """
    @kernel_function(    #2
        description="加载用户已观看过的电影列表"
        name="LoadSeenMovies",
    )
    def load_seen_movies(self) -> str:    #3
        尝试:
            with open("seen_movies.txt", 'r') as file:    #4
                lines = [line.strip() for line in file.readlines()]
                逗号分隔字符串 = ', '.join(lines)
            返回逗号分隔字符串
        除异常 e 外:
            print(f"读取文件时出错:{e}")
            返回 None
class MySeenMoviesDatabase:
    """
    Description: Manages the list of users seen movies.     #1
    """
    @kernel_function(     #2
        description="Loads a list of movies … user has already seen",
        name="LoadSeenMovies",
    )
    def load_seen_movies(self) -> str:     #3
        try:
            with open("seen_movies.txt", 'r') as file:     #4
                lines = [line.strip() for line in file.readlines()]
                comma_separated_string = ', '.join(lines)
            return comma_separated_string
        except Exception as e:
            print(f"Error reading file: {e}")
            return None
#1 提供容器类的描述

#2 使用装饰器提供函数描述和名称

#3 实际函数返回一个以逗号分隔的字符串形式的电影列表。

#4 从文本文件中加载已观看的电影

定义好原生函数后,我们可以通过向下滚动文件来查看它的使用方式,如下面的列表所示。

With the native function defined, we can see how it’s used by scrolling down in the file, as shown in the following listing.

清单 5.15 SK_native_functions(剩余代码)
plugins_directory = "plugins"

推荐器 = kernel.import_plugin_from_prompt_directory(
    plugins_directory,
    “推荐人”,
)    #1

推荐 = 推荐器["推荐电影"]

seen_movies_plugin = kernel.import_plugin_from_object(
    MySeenMoviesDatabase(), "SeenMoviesPlugin"
)    #2

load_seen_movies = seen_movies_plugin["LoadSeenMovies"]    #3

async def show_seen_movies():
    seen_movie_list = await load_seen_movies(kernel)
    返回已看电影列表

seen_movie_list = asyncio.run(show_seen_movies())    #4
print(seen_movie_list)

async def run():     #5
    result = await kernel.invoke(
        推荐,
        sk.KernelArguments(
                settings=execution_settings,
                输入=已看电影列表),
    )
    打印(结果)


asyncio.run(run()) #5

###输出
《黑客帝国》、《黑客帝国2:重装上阵》、《黑客帝国3:矩阵革命》、《黑客帝国》
复活——output from print statement
根据你对《黑客帝国》系列的兴趣来看,你似乎很喜欢……
具有强烈哲学内涵和动作元素的科幻电影
元素。鉴于你已经观看了所有
plugins_directory = "plugins"

recommender = kernel.import_plugin_from_prompt_directory(
    plugins_directory,
    "Recommender",
)     #1

recommend = recommender["Recommend_Movies"]

seen_movies_plugin = kernel.import_plugin_from_object(
    MySeenMoviesDatabase(), "SeenMoviesPlugin"
)     #2

load_seen_movies = seen_movies_plugin["LoadSeenMovies"]     #3

async def show_seen_movies():
    seen_movie_list = await load_seen_movies(kernel)
    return seen_movie_list

seen_movie_list = asyncio.run(show_seen_movies())     #4
print(seen_movie_list)

async def run():      #5
    result = await kernel.invoke(
        recommend,
        sk.KernelArguments(
                settings=execution_settings,
                input=seen_movie_list),
    )
    print(result)


asyncio.run(run())    #5

###Output
The Matrix, The Matrix Reloaded, The Matrix Revolutions, The Matrix 
Resurrections – output from print statement
Based on your interest in the "The Matrix" series, it seems you enjoy 
science fiction films with a strong philosophical undertone and action 
elements. Given that you've watched all
#1 加载语义函数(如前所述)

#2 将技能导入内核并将该函数注册为插件

#3 加载原生函数

#4 执行该函数并将列表作为字符串返回

#5 将插件调用包装在一个异步函数中并执行

需要注意的一个重要方面是原生函数是如何导入到内核中的。导入到内核的操作会将该函数注册为插件/技能。这意味着可以通过内核中的技能,在其他对话或交互中使用该函数。我们将在下一节中了解如何将原生函数嵌入到语义函数中。

One important aspect to note is how the native function was imported into the kernel. The act of importing to the kernel registers that function as a plugin/skill. This means the function can be used as a skill from the kernel through other conversations or interactions. We’ll see how to embed a native function within a semantic function in the next section.

5.4.3将原生函数嵌入语义函数中

5.4.3 Embedding native functions within semantic functions

SK 拥有众多强大的功能,其中一项实用功能是能够将原生功能或语义功能嵌入到其他语义功能中。以下示例展示了如何将原生功能嵌入到语义功能中。

There are plenty of powerful features within SK, but one beneficial feature is the ability to embed native or semantic functions within other semantic functions. The following listing shows how a native function can be embedded within a semantic function.

清单 5.16 SK_semantic_native_functions.py ( skprompt)
sk_prompt = """
你是一位眼光独到的电影推荐人,有人请你推荐一部电影。
将电影发送给用户。
您有一份用户之前观看过的电影列表。
你想推荐一部电影
该用户之前没有观看过。    #1
电影列表:{{MySeenMoviesDatabase.LoadSeenMovies}}。    #2
"""
sk_prompt = """
You are a wise movie recommender and you have been asked to recommend a 
movie to a user.
You have a list of movies that the user has watched before.
You want to recommend a movie that 
the user has not watched before.     #1
Movie List: {{MySeenMoviesDatabase.LoadSeenMovies}}.     #2
"""
#1 与上文完全相同的指令文本

#2 本地函数通过类名和函数名进行引用和标识。

下一个示例SK_semantic_native_functions.py使用了内联原生函数和语义函数。在 VS Code 中打开该文件,以下代码清单展示了创建、注册和执行这些函数的代码。

The next example, SK_semantic_native_functions.py, uses inline native and semantic functions. Open the file in VS Code, and the following listing shows the code to create, register, and execute the functions.

清单 5.17 SK_semantic_native_functions.py(节选)
prompt_template_config = sk.PromptTemplateConfig(
    template=sk_prompt,
    名称="tldr",
    template_format="语义内核",
    执行设置=执行设置,
)    #1

推荐函数 = kernel.create_function_from_prompt(
    prompt_template_config=prompt_template_config,
    function_name="推荐电影",
    plugin_name="推荐",
)    #2


异步定义 run_recommendation():    #3
    建议 = await kernel.invoke(
        推荐函数
        sk.KernelArguments(),
    )
    打印(建议)


# 使用 asyncio.run 执行异步函数
asyncio.run(run_recommendation())
###输出
根据提供的列表来看,这位用户似乎是《黑客帝国》的粉丝。
系列电影。因为他们已经看过全部四部《黑客帝国》电影,所以我
建议……
prompt_template_config = sk.PromptTemplateConfig(
    template=sk_prompt,
    name="tldr",
    template_format="semantic-kernel",
    execution_settings=execution_settings,
)     #1

recommend_function = kernel.create_function_from_prompt(
    prompt_template_config=prompt_template_config,
    function_name="Recommend_Movies",
    plugin_name="Recommendation",
)     #2


async def run_recommendation():     #3
    recommendation = await kernel.invoke(
        recommend_function,
        sk.KernelArguments(),
    )
    print(recommendation)


# Use asyncio.run to execute the async function
asyncio.run(run_recommendation())
###Output
Based on the list provided, it seems the user is a fan of the Matrix 
franchise. Since they have watched all four existing Matrix movies, I 
would recommend a…
#1 创建提示符的提示符模板配置

#2 从提示符创建内联语义函数

#3 异步执行语义函数

运行代码后,你应该会看到类似清单 5.17 的输出。需要注意的一点是,本地函数已在内核中注册,但语义函数尚未注册。这一点很重要,因为函数创建并不会注册函数。

Run the code, and you should see an output like listing 5.17. One important aspect to note is that the native function is registered with the kernel, but the semantic function is not. This is important because function creation doesn’t register a function.

为了使此示例正常运行,必须将原生函数注册到内核中,内核会调用该import_plugin函数——即清单 5.17 中的第一行。然而,语义函数本身并未注册。注册该函数的一个简单方法是将其制作成插件并导入。

For this example to work correctly, the native function must be registered with the kernel, which uses the import_plugin function call—the first line in listing 5.17. However, the semantic function itself isn’t registered. An easy way to register the function is to make it a plugin and import it.

这些简单的练习展示了如何将插件和技能集成到聊天或代理界面中。下一节,我们将通过一个完整的示例来演示如何将代表服务或 GPT 界面的插件添加到聊天功能中。

These simple exercises showcase ways to integrate plugins and skills into chat or agent interfaces. In the next section, we’ll look at a complete example demonstrating adding a plugin representing a service or GPT interface to a chat function.

5.5语义内核作为交互式服务代理

5.5 Semantic Kernel as an interactive service agent

在第一章中,我们介绍了GPT接口的概念——这是一种通过插件和语义层将服务和其他组件连接到LLM的新范式。SK提供了一个优秀的抽象层,可以将任何服务转换为GPT接口。

In chapter 1, we introduced the concept of the GPT interface—a new paradigm in connecting services and other components to LLMs via plugins and semantic layers. SK provides an excellent abstraction for converting any service to a GPT interface.

图 5.8 展示了一个围绕名为“电影数据库”(TMDB; www.themoviedb.org )的 API 服务构建的 GPT 接口。TMDB 网站提供了一个免费的 API,用于公开电影和电视节目的信息。

Figure 5.8 shows a GPT interface constructed around an API service called The Movie Database (TMDB; www.themoviedb.org). The TMDB site provides a free API that exposes information about movies and TV shows.

数字
图 5.8此层架构图显示了 GPT 接口和语义内核向聊天或代理接口暴露的作用。

要完成本节的练习,您必须在 TMDB 注册一个免费帐户并创建一个 API 密钥。获取 API 密钥的说明可以在 TMDB 网站 ( www.themoviedb.org ) 上找到,或者咨询 GPT-4 turbo 或更新的 LLM。

To follow along with the exercises in this section, you must register for a free account from TMDB and create an API key. Instructions for getting an API key can be found at the TMDB website (www.themoviedb.org) or by asking a GPT-4 turbo or a more recent LLM.

在接下来的几个小节中,我们将使用 SK 的一组原生函数创建一个 GPT 接口。然后,我们将使用 SK 内核测试该接口,并在本章后面将其作为插件集成到聊天功能中。下一节,我们将探讨如何针对 TMDB API 构建 GPT 接口。

Over the next set of subsections, we’ll create a GPT interface using an SK set of native functions. Then, we’ll use the SK kernel to test the interface and, later in this chapter, implement it as plugins into a chat function. In the next section, we look at building a GPT interface against the TMDB API.

5.5.1构建语义 GPT 接口

5.5.1 Building a semantic GPT interface

TMDB 是一项优秀的服务,但它不提供语义服务,也不提供可以集成到 ChatGPT 或代理中的服务。为此,我们必须将 TMDB 公开的 API 调用封装在一个语义服务层中。

TMDB is an excellent service, but it provides no semantic services or services that can be plugged into ChatGPT or an agent. To do that, we must wrap the API calls that TMDB exposes in a semantic service layer.

语义服务层是一个 GPT 接口,它通过自然语言公开各种功能。如前所述,要将功能公开给 ChatGPT 或其他接口(例如代理),必须将其定义为插件。幸运的是,只要我们正确编写语义服务层,SK 就可以自动为我们创建这些插件。

A semantic service layer is a GPT interface that exposes functions through natural language. As discussed, to expose functions to ChatGPT or other interfaces such as agents, they must be defined as plugins. Fortunately, SK can create the plugins for us automatically, given that we write our semantic service layer correctly.

原生插件或技能集可以作为语义层。要创建原生插件,请创建一个新的插件文件夹,并将包含一组原生函数的类所在的 Python 文件放入该文件夹内。SK 扩展目前在这方面做得不够好,因此手动创建模块是最佳选择。

A native plugin or set of skills can act as a semantic layer. To create a native plugin, create a new plugin folder, and put a Python file holding a class containing the set of native functions inside that folder. The SK extension currently doesn’t do this well, so manually creating the module works best.

Movies图 5.9 显示了名为 的新插件和名为 的语义服务层的结构。对于原生函数,导入时使用tmdb.py父文件夹的名称( )。Movies

Figure 5.9 shows the structure of the new plugin called Movies and the semantic service layer called tmdb.py. For native functions, the parent folder’s name (Movies) is used in the import.

数字
图 5.9 TMDB 插件的文件夹和文件结构

在 VS Code 中打开该tmdb.py文件,并查看文件顶部,如清单 5.18 所示。该文件包含一个名为 `<class_name>` 的类TMDbService,该类公开了多个映射到 API 端点调用的函数。其目的是将各种相关的 API 函数调用映射到此语义服务层。这将使这些函数作为插件公开,用于聊天或代理界面。

Open the tmdb.py file in VS Code, and look at the top of the file, as shown in listing 5.18. This file contains a class called TMDbService, which exposes several functions that map to API endpoint calls. The idea is to map the various relevant API function calls in this semantic service layer. This will expose the functions as plugins for a chat or agent interface.

清单 5.18 tmdb.py(文件顶部)
from semantic_kernel.functions import kernel_funct
导入请求
导入检查


def print_function_call():    #1
    #已省略…


类 TMDbService:    #2
    def __init__(self):
        在此处输入您的 TMDb API 密钥
        self.api_key = "你的 TMDb API 密钥"


    @kernel_function( #2
        description="获取给定类型名称的电影类型 ID",
        name="get_movie_genre_id",
        input_description="要获取的 genre_id 对应的电影类型名称",
        )
    def get_movie_genre_id(self, genre_name: str) -> str:    #3
        print_function_call()
        base_url = "https://api.themoviedb.org/3"
        endpoint = f"{base_url}/genre/movie/list 
                      ?api_key={self.api_key}&language=en-US"

        response = requests.get(endpoint)    #4
        如果 response.status_code == 200:#4
            genres = response.json()['genres']
            对于 genres 中的 genre:
                如果 genre_name.lower() 在 genre['name'].lower() 中:
                    返回 str(genre['id'])    #5
        返回 None
from semantic_kernel.functions import kernel_funct
import requests
import inspect


def print_function_call():     #1
    #omitted …


class TMDbService:     #2
    def __init__(self):
        # enter your TMDb API key here
        self.api_key = "your-TMDb-api-key"


    @kernel_function(     #2
        description="Gets the movie genre ID for a given genre name",
        name="get_movie_genre_id",
        input_description="The movie genre name of the genre_id to get",
        )
    def get_movie_genre_id(self, genre_name: str) -> str:     #3
        print_function_call()
        base_url = "https://api.themoviedb.org/3"
        endpoint = f"{base_url}/genre/movie/list
                      ?api_key={self.api_key}&language=en-US"

        response = requests.get(endpoint)     #4
        if response.status_code == 200:    #4
            genres = response.json()['genres']
            for genre in genres:
                if genre_name.lower() in genre['name'].lower():
                    return str(genre['id'])     #5
        return None
#1 打印函数调用以进行调试

#2 用于描述函数的顶级服务和装饰器(良好的描述至关重要)

#3 函数被语义包装器包裹;应返回字符串

#4 调用 API 端点,如果成功(状态码 200),则检查匹配的类型

#5 找到类型,返回其 ID

大部分用于TMDbService调用 TMDB 端点的函数代码都是借助 GPT-4 Turbo 编写的。然后,每个函数都用sk_function装饰器进行了封装,以便语义化地暴露出来。

The bulk of the code for the TMDbService and the functions to call the TMDB endpoints was written with the help of GPT-4 Turbo. Then, each function was wrapped with the sk_function decorator to expose it semantically.

TMDB API 的部分调用已进行语义映射。清单 5.19 展示了另一个暴露给语义服务层的函数示例。该函数会获取特定类型电影的当前上映排名前十的列表。

A few of the TMDB API calls have been mapped semantically. Listing 5.19 shows another example of a function exposed to the semantic service layer. This function pulls a current top 10 list of movies playing for a particular genre.

清单 5.19 tmdb.py ( get_top_movies_by_genre)
@kernel_function(    #1
        描述="””
获取指定类型电影的当前播放列表”。
        name="get_top_movies_by_genre",
        input_description="要获取的电影类型"
        )
    def get_top_movies_by_genre(self, genre: str) -> str:
        print_function_call()
        genre_id = self.get_movie_genre_id(genre)    #2
        如果 genre_id:
            base_url = "https://api.themoviedb.org/3
            playing_movies_endpoint = f"{base_url}/movie/now_playing? 
 api_key={self.api_key}&language=en-US"
            响应 = requests.get(
                          playing_movies_endpoint)    #3
            如果 response.status_code != 200:
                返回 ””

            playing_movies = response.json()['results'
            for movie in playing_movies:    #4
                movie['genre_ids'] = [str(genre_id)  
                       for genre_id in movie['genre_ids']]
            filtered_movies = [movie for movie 
 in playing_movies if genre_id 
 in movie['genre_ids']][:10]    #5
            results = ", ".join([movie['title'] for movie in filtered_movies])
            返回结果
        别的:
            返回 ””
@kernel_function(     #1
        description="””
Gets a list of currently playing movies for a given genre””",
        name="get_top_movies_by_genre",
        input_description="The genre of the movies to get",
        )
    def get_top_movies_by_genre(self, genre: str) -> str:
        print_function_call()
        genre_id = self.get_movie_genre_id(genre)     #2
        if genre_id:
            base_url = "https://api.themoviedb.org/3
            playing_movies_endpoint = f"{base_url}/movie/now_playing?
 api_key={self.api_key}&language=en-US"
            response = requests.get(
                          playing_movies_endpoint)     #3
            if response.status_code != 200:
                return ""

            playing_movies = response.json()['results'
            for movie in playing_movies:     #4
                movie['genre_ids'] = [str(genre_id)  
                       for genre_id in movie['genre_ids']]
            filtered_movies = [movie for movie 
 in playing_movies if genre_id 
 in movie['genre_ids']][:10]     #5
            results = ", ".join([movie['title'] for movie in filtered_movies])
            return results
        else:
            return ""
#1 为函数添加描述

#2 查找给定类型名称对应的类型 ID

#3 获取当前正在播放的电影列表

#4 将类型 ID 转换为字符串

#5 检查类型 ID 是否与电影类型匹配

查看其他已进行语义映射的 API 调用。正如您所见,将 API 调用转换为语义服务存在一个明确的模式。在运行完整服务之前,我们将在下一节中测试每个函数。

Look through the various other API calls mapped semantically. As you can see, there is a well-defined pattern for converting API calls to a semantic service. Before we run the full service, we’ll test each of the functions in the next section.

5.5.2测试语义服务

5.5.2 Testing semantic services

在实际应用中,你可能需要为每个语义服务函数编写一套完整的单元测试或集成测试。这里我们不会这样做;相反,我们会编写一个简单的辅助脚本来测试各种函数。

In a real-world application, you’ll likely want to write a complete set of unit or integration tests for each semantic service function. We won’t do that here; instead, we’ll write a quick helper script to test the various functions.

在 VS Code 中打开test_tmdb_service.py代码,并查看代码,如清单 5.20 所示。您可以注释或取消注释任何函数以单独测试它们。请确保一次只取消注释一个函数。

Open test_tmdb_service.py in VS Code, and review the code, as shown in listing 5.20. You can comment and uncomment any functions to test them in isolation. Be sure to have only one function uncommented at a time.

清单 5.20 test_tmdb_service.py
导入 semantic_kernel as sk
from plugins.Movies.tmdb import TMDbService

async def main():
    kernel = sk.Kernel()    #1

    tmdb_service = kernel.import_plugin_from_object 
 (TMDbService(), "TMDBService")    #2

    打印(
        await tmdb_service["get_movie_genre_id"](
            内核,sk.KernelArguments(
                            genre_name="action")    #3
        )
    )    #4
    打印(
        await tmdb_service["get_tv_show_genre_id"](
            内核,sk.KernelArguments(
                            genre_name="action")    #5
        )
    )    #6
    打印(
        await tmdb_service["get_top_movies_by_genre"](
            内核,sk.KernelArguments(
                            genre_name="action")    #7
        )
    )    #8
    打印(
        await tmdb_service["get_top_tv_shows_by_genre"](
            内核,sk.KernelArguments(
                            genre_name="action") #7
        )
    )
    print(await tmdb_service["get_movie_genres"](
内核,sk.KernelArguments()))                       #9
    print(await tmdb_service["get_tv_show_genres"](
内核,sk.KernelArguments())) #9


# 运行主函数
如果 __name__ == "__main__":
    导入 asyncio

    asyncio.run(main())    #10

###输出
函数名称:get_top_tv_shows_by_genre    #11
论点:
  self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
  类型 = 动作
函数名:get_tv_show_genre_id #11
论点:
  self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
  类型名称 = 动作
《奥术》、《海贼王》、《瑞克和莫蒂》、《降世神通:最后的气宗》、《钢之炼金术师》
炼金术师FA、鬼灭之刃、无敌铁拳
进击的巨人、我的英雄学院、斗志、猫头鹰小屋
import semantic_kernel as sk
from plugins.Movies.tmdb import TMDbService

async def main():
    kernel = sk.Kernel()     #1

    tmdb_service = kernel.import_plugin_from_object 
 (TMDbService(), "TMDBService")     #2

    print(
        await tmdb_service["get_movie_genre_id"](
            kernel, sk.KernelArguments(
                            genre_name="action")     #3
        )
    )     #4
    print(
        await tmdb_service["get_tv_show_genre_id"](
            kernel, sk.KernelArguments(
                            genre_name="action")     #5
        )
    )     #6
    print(
        await tmdb_service["get_top_movies_by_genre"](
            kernel, sk.KernelArguments(
                            genre_name="action")     #7
        )
    )     #8
    print(
        await tmdb_service["get_top_tv_shows_by_genre"](
            kernel, sk.KernelArguments(
                            genre_name="action")    #7
        )
    )
    print(await tmdb_service["get_movie_genres"](
kernel, sk.KernelArguments()))                        #9
    print(await tmdb_service["get_tv_show_genres"](
kernel, sk.KernelArguments()))                       #9


# Run the main function
if __name__ == "__main__":
    import asyncio

    asyncio.run(main())     #10

###Output
Function name: get_top_tv_shows_by_genre     #11
Arguments:
  self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
  genre = action
Function name: get_tv_show_genre_id    #11
Arguments:
  self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
  genre_name = action
Arcane, One Piece, Rick and Morty, Avatar: The Last Airbender, Fullmetal 
Alchemist: Brotherhood, Demon Slayer: Kimetsu no Yaiba, Invincible, 
Attack on Titan, My Hero Academia, Fighting Spirit, The Owl House
#1 实例化内核

#2 导入插件服务

#3 根据需要向函数输入参数

#4 执行并测试各种函数

#5 根据需要向函数输入参数

#6 执行并测试各种函数

#7 根据需要向函数输入参数

#8 执行并测试各种函数

#9 执行并测试各种函数

#10 异步执行主 函数

#11 调用 print 函数详细信息以通知何时调用该函数

SK 的真正强大之处在这个测试中得以展现。请注意,该类TMDbService是以插件的形式导入的,但我们无需定义任何插件配置,只需进行一些必要的设置即可。我们仅通过编写一个封装了几个 API 函数的类,就实现了 TMDB API 的部分语义化暴露。现在,这些函数已经暴露出来,我们可以在下一节中探讨如何将它们用作聊天界面的插件。

The real power of SK is shown in this test. Notice how the TMDbService class is imported as a plugin, but we don’t have to define any plugin configurations other than what we already did? By just writing one class that wrapped a few API functions, we’ve exposed part of the TMDB API semantically. Now, with the functions exposed, we can look at how they can be used as plugins for a chat interface in the next section.

5.5.3与语义服务层的交互式聊天

5.5.3 Interactive chat with the semantic service layer

在TMDB功能语义化公开之后,我们可以着手将其集成到聊天界面中。这样一来,我们就可以在聊天界面中进行自然流畅的对话,获取各种信息,例如当前的热门电影。

With the TMDB functions exposed semantically, we can move on to integrating them into a chat interface. This will allow us to converse naturally in this interface to get various information, such as current top movies.

在 VS Code 中打开SK_service_chat.py。向下滚动到创建函数的新代码段的开头,如清单 5.21 所示。这里创建的函数现在作为插件公开,但我们过滤掉了聊天函数,因为我们不想将其作为插件公开。聊天函数允许用户直接与 LLM 对话,因此不应该作为插件公开。

Open SK_service_chat.py in VS Code. Scroll down to the start of the new section of code that creates the functions, as shown in listing 5.21. The functions created here are now exposed as plugins, except we filter out the chat function, which we don’t want to expose as a plugin. The chat function here allows the user to converse directly with the LLM and shouldn’t be a plugin.

清单 5.21 SK_service_chat.py(功能设置)
系统消息 = "你是一个有用的AI助手。"

tmdb_service = kernel.import_plugin_from_object(
TMDbService(), "TMDBService")    #1

# 提取的代码段
execution_settings = sk_oai.OpenAIChatPromptExecutionSettings(
        service_id=service_id,
        ai_model_id=model_id,
        max_tokens=2000,
        温度=0.7,
        top_p=0​​.8,
        tool_choice="auto",
        tools=get_tool_call_object(
            内核,{"exclude_plugin": ["ChatBot"]})    #2
    )

prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
    max_tokens=2000,
    温度=0.7,
    top_p=0​​.8,
    function_call="auto",
    chat_system_prompt=system_message,
)    #3
提示模板 = OpenAIChatPromptTemplate(
    "{{$user_input}}", kernel.prompt_template_engine, prompt_config
)    #4

history = ChatHistory()

history.add_system_message("您推荐电影和电视节目。")
history.add_user_message("你好,请问你是谁?")
history.add_assistant_message(
    我是推荐聊天机器人鲁迪。我正在努力弄清楚什么
人们需要。
)    #5

chat_function = kernel.create_function_from_prompt(
    prompt_template_config=prompt_template,
    plugin_name="聊天机器人",
    函数名="聊天",
)    #6
system_message = "You are a helpful AI assistant."

tmdb_service = kernel.import_plugin_from_object(
TMDbService(), "TMDBService")     #1

# extracted section of code
execution_settings = sk_oai.OpenAIChatPromptExecutionSettings(
        service_id=service_id,
        ai_model_id=model_id,
        max_tokens=2000,
        temperature=0.7,
        top_p=0.8,
        tool_choice="auto",
        tools=get_tool_call_object(
            kernel, {"exclude_plugin": ["ChatBot"]}),     #2
    )

prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
    max_tokens=2000,
    temperature=0.7,
    top_p=0.8,
    function_call="auto",
    chat_system_prompt=system_message,
)     #3
prompt_template = OpenAIChatPromptTemplate(
    "{{$user_input}}", kernel.prompt_template_engine, prompt_config
)     #4

history = ChatHistory()

history.add_system_message("You recommend movies and TV Shows.")
history.add_user_message("Hi there, who are you?")
history.add_assistant_message(
    "I am Rudy, the recommender chat bot. I'm trying to figure out what 
people need."
)     #5

chat_function = kernel.create_function_from_prompt(
    prompt_template_config=prompt_template,
    plugin_name="ChatBot",
    function_name="Chat",
)     #6
#1 导入 TMDbService 作为插件

#2 配置执行设置并添加筛选工具

#3 配置提示符配置

#4 定义输入模板并接受完整字符串作为用户输入

#5 添加聊天历史记录对象并填充部分历史记录

#6 创建聊天功能

接下来,我们可以继续滚动浏览同一文件中的聊天功能,如下表所示。

Next, we can continue by scrolling in the same file to review the chat function, as shown in the following listing.

列表 5.22 SK_service_chat.py(聊天功能)
async def chat() -> bool:
    尝试:
        user_input = input("用户:> ")    #1
    除了键盘中断:
        print("\n\n正在退出聊天...")
        返回 False
    除 EOFError 外:
        print("\n\n正在退出聊天...")
        返回 False

    如果 user_input == "exit":    #2
        print("\n\n正在退出聊天...")
        返回 False
    arguments = sk.KernelArguments(    #3
        用户输入=用户输入,
        history=("\n").join(
           [f"{msg.role}: {msg.content}" for msg in history])
    )
    result = await chat_completion_with_tool_call(    #4
        内核=内核,
        参数=参数,
        chat_plugin_name="聊天机器人",
        chat_function_name="聊天",
        chat_history=历史记录,
    )
    print(f"AI Agent:> {result}")
    返回 True
async def chat() -> bool:
    try:
        user_input = input("User:> ")     #1
    except KeyboardInterrupt:
        print("\n\nExiting chat...")
        return False
    except EOFError:
        print("\n\nExiting chat...")
        return False

    if user_input == "exit":     #2
        print("\n\nExiting chat...")
        return False
    arguments = sk.KernelArguments(     #3
        user_input=user_input,
        history=("\n").join(
           [f"{msg.role}: {msg.content}" for msg in history]),
    )
    result = await chat_completion_with_tool_call(     #4
        kernel=kernel,
        arguments=arguments,
        chat_plugin_name="ChatBot",
        chat_function_name="Chat",
        chat_history=history,
    )
    print(f"AI Agent:> {result}")
    return True
#1 输入直接从终端/控制台获取。

#2 如果用户输入 exit,则退出聊天。

#3 创建要传递给函数的参数 。

#4 使用实用函数调用函数并执行工具。

最后,向下滚动到文件底部,查看主函数。这段代码循环调用聊天函数。

Lastly, scroll down to the bottom of the file, and review the primary function. This is the code that calls the chat function in a loop.

清单 5.23 SK_service_chat.py(主要功能)
async def main() -> None:
    聊天 = 真
    context = kernel.create_new_context()

    print("欢迎使用您的第一个 AI 代理\"    #1
输入“exit”退出。
请提供一份按类型分类的当前上映电影列表。
    )
    聊天时:    #2
        聊天,上下文 = await chat(上下文)    #3


如果 __name__ == "__main__":
    asyncio.run(main())
async def main() -> None:
    chatting = True
    context = kernel.create_new_context()

    print("Welcome to your first AI Agent\     #1
\n  Type 'exit' to exit.\
\n  Ask to get a list of currently playing movies by genre."
    )
    while chatting:     #2
        chatting, context = await chat(context)     #3


if __name__ == "__main__":
    asyncio.run(main())
#1 用户介绍

#2 持续执行,直到聊天状态变为 False

#3 异步调用聊天功能

运行聊天界面,运行文件(F5),然后询问特定类型的电影或电视节目。示例对话如清单 5.24 所示。此输出显示了列出两种类型电影的请求如何导致聊天界面多次调用该get_top_movie_by_genre函数。

Run the chat interface, run the file (F5), and then ask about movies or television shows of a particular genre. An example conversation session is shown in listing 5.24. This output shows how a request to list movies from two genres made the chat interface make multiple calls to the get_top_movie_by_genre function.

列表 5.24 SK_service_chat.py(对话示例)
欢迎来到您的第一个人工智能代理。
  输入“exit”退出。
  请提供一份按类型分类的当前上映电影列表。
用户:> 输入:请问能否提供当前最热门电影的列表?
动作片和喜剧片?

函数名:get_top_movies_by_genre    #1
论点:
  类型 = 动作
函数名称:get_movie_genre_id    #2
论点:
  类型名称 = 动作
函数名:get_top_movies_by_genre #1
论点:
  类型 = 喜剧
函数名:get_movie_genre_id #2
论点:
  类型名称 = 喜剧
经纪人:以下是目前最热门的电影
动作片和喜剧片类型:

**行动:**    #3
1.《饥饿游戏:鸣鸟与蛇之歌》
2. 叛逆之月 - 第一部分:火之子
3.《海王与失落的王国》
4. 平安夜
5. 家庭计划
6. 自由职业
7. 移民
8. 自由之声
9. 哥斯拉减一

**喜剧:**    #4
1. 家庭计划
2. 旺卡
3. 自由职业
4. 索尔特本
5.《小鸡快跑:金块的黎明》
6. 巨魔们团结起来
7. 谷仓里有东西
8. 移民

请注意,有些电影可能同时属于这两种类型,例如:
《家庭计划》和《自由职业》。
Welcome to your first AI Agent
  Type 'exit' to exit.
  Ask to get a list of currently playing movies by genre.
User:> Input: can you give me list of the current top playing movies for 
the action and comedy genres?

Function name: get_top_movies_by_genre     #1
Arguments:
  genre = action
Function name: get_movie_genre_id     #2
Arguments:
  genre_name = action
Function name: get_top_movies_by_genre    #1
Arguments:
  genre = comedy
Function name: get_movie_genre_id    #2
Arguments:
  genre_name = comedy
Agent:> Here are the current top-playing movies 
for the action and comedy genres:

**Action:**     #3
1. The Hunger Games: The Ballad of Songbirds & Snakes
2. Rebel Moon - Part One: A Child of Fire
3. Aquaman and the Lost Kingdom
4. Silent Night
5. The Family Plan
6. Freelance
7. Migration
8. Sound of Freedom
9. Godzilla Minus One

**Comedy:**     #4
1. The Family Plan
2. Wonka
3. Freelance
4. Saltburn
5. Chicken Run: Dawn of the Nugget
6. Trolls Band Together
7. There's Something in the Barn
8. Migration

Please note that some movies may overlap in both genres, such as 
"The Family Plan" and "Freelance ."
#1 LLM 调用两次 get_top_movies_by_genre 函数。

#2 内部调用获取类型 ID。

#3 当前热门动作片列表。

#4 当前热门喜剧片列表。

务必探索聊天界面的功能范围,以及您可以从 TMDB 服务中获取哪些信息。例如,您可以尝试查询电影或电视剧的类型列表。这项服务是一个不错的尝试,但我们或许可以做得更好,我们将在下一节中看到这一点。

Be sure to explore the chat interface’s boundaries and what you can ask for from the TMDB service. For example, try asking for a list of genres for movies or television shows. This service is a good first try, but we can perhaps do better, as we’ll see in the next section.

5.6编写语义服务时进行语义思考

5.6 Thinking semantically when writing semantic services

现在我们已经看到了将 API 转换为语义服务接口的绝佳示例。目前,这些函数返回当前正在播放的热门电影和电视剧的标题。然而,仅仅返回标题会限制 LLM 自行解析结果的能力。

Now we’ve seen an excellent demonstration of converting an API into a semantic service interface. As it is, the functions return the titles of the top movies and television shows currently playing. However, by just returning the titles, we’re limiting the ability of the LLM to parse the results on its own.

因此,我们将创建一个 v2 版本TMDbService来修正这个问题,并将结果以 JSON 字符串的形式返回。tmdb_v2.py在 VS Code 中打开该文件,并向下滚动找到该get_top_movies_by_genre函数。

Therefore, we’ll create a v2 version of TMDbService to correct this and return the results as JSON strings. Open the file tmdb_v2.py in VS Code, and scroll down to the get_top_movies_by_genre function.

清单 5.25 tmdb_v2.py ( get_top_movies_by_genre)
def get_top_movies_by_genre(self, genre: str) -> str:
        print_function_call()
        genre_id = self.get_movie_genre_id(genre)
        如果 genre_id:
            相同的代码……

            返回 json.dumps(filtered_movies)    #1
        别的:
            返回 ””
def get_top_movies_by_genre(self, genre: str) -> str:
        print_function_call()
        genre_id = self.get_movie_genre_id(genre)
        if genre_id:
            #same code …

            return json.dumps(filtered_movies)     #1
        else:
            return ""
#1 现在返回一个过滤后的列表,格式为 JSON 字符串。

现在SK_service_chat.py在 VS Code 中打开,注释掉清单 5.26 中所示的那一行代码,然后再取消注释。这样就会使用版本 2,TMDbService将结果作为完整的 JSON 文档输出到单个字符串中。

Now open SK_service_chat.py in VS Code, and comment and uncomment the line shown in listing 5.26. This will then use version 2 of the TMDbService that outputs results as full JSON documents in a single string.

清单 5.26 SK_service_chat.py(修改导入)
#from skills.Movies.tmdb import TMDbService    #1
from skills.Movies.tmdb_v2 import TMDbService    #2
#from skills.Movies.tmdb import TMDbService     #1
from skills.Movies.tmdb_v2 import TMDbService     #2
#1 注释掉这行代码。

#2 取消注释这行代码以使用该服务的版本 2。

在 VS Code 中重新运行该SK_service_chat.py文件,并稍微修改查询,如下面的输出所示。

Rerun the SK_service_chat.py file in VS Code, and alter your query slightly, as shown by the output in the following listing.

清单 5.27 SK_service_chat.pyTMDb_v2服务输出)
用户:> 获取当前正在播放的电影列表
动作类型,而且只回归太空题材电影。    #1
代理人:> 查找当前正在上映的特定动作电影
关于空间方面,我需要手动筛选提供的列表。
那些概述中包含太空相关主题的网站。以下网站符合这些条件。
列表中的标准:

1. **叛逆之月 - 第一部:火之子**    #2
   - 发布日期:2023年12月15日
   概述:当银河系边缘的一个和平殖民地发现……
它本身也受到暴君摄政王巴利萨留军队的威胁,
他们派遣一位名叫科拉的年轻女子,她有着神秘的过去,去寻找……
他们从邻近星球调集战士来帮助他们抵抗。

这是所提供列表中唯一一部明确提到“a”的电影。
概述中涉及太空主题。
User:> get a list of currently playing movies for the 
action genre and only return movies about space     #1
Agent:> To find currently playing action movies that are specifically 
about space, I will need to manually filter the provided list for those 
that have space-related themes in their overview. Here's what fits that 
criteria from the list:

1. **Rebel Moon - Part One: A Child of Fire**     #2
   - Release Date: 2023-12-15
   - Overview: When a peaceful colony on the edge of the galaxy finds 
itself threatened by the armies of the tyrannical Regent Balisarius, 
they dispatch Kora, a young woman with a mysterious past, to seek out 
warriors from neighboring planets to help them take a stand.

This is the only movie from the provided list that clearly mentions a 
space-related theme in its overview. …
#1 新查询要求添加空间附加过滤器

#2 LLM 调用该服务,然后审查与过滤器匹配的返回结果。

由于语义服务函数现在以 JSON 格式返回完整的电影列表,LLM 可以应用额外的过滤条件。这正是语义服务的真正强大之处,它允许您通过 LLM 处理数据。仅仅返回电影标题列表是无法体现这种强大功能的。

Because the semantic service functions now return the complete movie listing in JSON, the LLM can apply additional filtering. This is the real power of semantic services, allowing you to process the data through the LLM. We won’t see this power by just returning a list of titles.

最后一个练习展示了编写语义服务层时需要转变的思维方式。通常情况下,你会希望返回尽可能多的信息。返回更多信息可以利用 LLM 的独立过滤、排序和转换数据的功能。下一章,我们将探讨如何使用行为树构建自主代理。

This last exercise demonstrated the change in mentality you need to make when writing semantic service layers. Generally, you’ll typically want to return as much information as possible. Returning more information takes advantage of the LLM abilities to filter, sort, and transform data independently. In the next chapter, we’ll explore building autonomous agents using behavior trees.

5.7练习

5.7 Exercises

完成以下练习以加深你对相关内容的理解:

Complete the following exercises to improve your knowledge of the material:

  • 练习 1 — 创建温度转换的基本插件
  • Exercise 1—Creating a Basic Plugin for Temperature Conversion

目标 ——熟悉如何为 OpenAI 聊天补全 API 创建一个简单的插件。

Objective —Familiarize yourself with creating a simple plugin for the OpenAI chat completions API.

任务:

Tasks:

    • 开发一个插件,用于在摄氏度和华氏度之间转换温度。
    • 通过将其集成到简单的 OpenAI 聊天会话中来测试该插件,用户可以在其中询问温度转换。
    • Develop a plugin that converts temperatures between Celsius and Fahrenheit.
    • Test the plugin by integrating it into a simple OpenAI chat session where users can ask for temperature conversions.
  • 练习 2 — 开发天气信息插件
  • Exercise 2—Developing a Weather Information Plugin

目标 ——学习创建一个执行独特任务的插件。

Objective —Learn to create a plugin that performs a unique task.

任务:

Tasks:

    • 为 OpenAI 聊天补全 API 创建一个插件,该插件可以从公共 API 获取天气信息。
    • 确保插件能够处理用户对不同城市当前天气状况的请求。
    • Create a plugin for the OpenAI chat completions API that fetches weather information from a public API.
    • Ensure the plugin can handle user requests for current weather conditions in different cities.
  • 练习 3——构建创造性的语义功能
  • Exercise 3—Crafting a Creative Semantic Function

目标 ——探索语义函数的创建。

Objective —Explore the creation of semantic functions.

任务:

Tasks:

    • 开发一个语义函数,根据用户输入生成诗歌或讲述儿童故事。
    • 在聊天会话中测试该功能,以确保其能够生成富有创意且连贯的输出。
    • Develop a semantic function that writes a poem or tells a children’s story based on user input.
    • Test the function in a chat session to ensure it generates creative and coherent outputs.
  • 练习 4——利用原生功能增强语义功能
  • Exercise 4—Enhancing Semantic Functions with Native Functions

目标 ——了解如何结合语义功能和原生功能。

Objective —Understand how to combine semantic and native functions.

任务:

Tasks:

    • 创建一个语义函数,利用原生函数来增强其功能。
    • 例如,开发一个语义函数来生成膳食计划,并使用原生函数来获取食材的营养信息。
    • Create a semantic function that uses a native function to enhance its capabilities.
    • For example, develop a semantic function that generates a meal plan and uses a native function to fetch nutritional information for the ingredients.
  • 练习 5 — 使用语义内核封装现有 Web API
  • Exercise 5—Wrapping an Existing Web API with Semantic Kernel

目标 ——学习如何将现有的 Web API 封装成语义服务插件。

Objective —Learn to wrap existing web APIs as semantic service plugins.

任务:

Tasks:

    • 使用 SK 封装新闻 API,并将其作为聊天代理中的语义服务插件公开。
    • 确保插件能够处理用户对各种主题最新新闻文章的请求。
    • Use SK to wrap a news API and expose it as a semantic service plugin in a chat agent.
    • Ensure the plugin can handle user requests for the latest news articles on various topics.

概括

Summary

  • 代理操作扩展了代理系统(例如 ChatGPT)的功能。这包括向 ChatGPT 和 LLM 添加插件,使其作为操作的代理。
  • Agent actions extend the capabilities of an agent system, such as ChatGPT. This includes the ability to add plugins to ChatGPT and LLMs to function as proxies for actions.
  • OpenAI 支持在 OpenAI API 会话中使用函数定义和插件。这包括向 LLM API 调用添加函数定义,以及了解这些函数如何使 LLM 执行其他操作。
  • OpenAI supports function definitions and plugins within an OpenAI API session. This includes adding function definitions to LLM API calls and understanding how these functions allow the LLM to perform additional actions.
  • 语义内核 (SK) 是微软的一个开源项目,可用于构建人工智能应用程序和智能体系统。这包括语义插件在定义原生函数和语义函数方面的作用。
  • The Semantic Kernel (SK) is an open source project from Microsoft that can be used to build AI applications and agent systems. This includes the role of semantic plugins in defining native and semantic functions.
  • 语义功能封装了用于与LLM互动的提示/个人资料模板。
  • Semantic functions encapsulate the prompt/profile template used to engage an LLM.
  • 原生函数封装了使用 API 或其他接口执行操作的代码。
  • Native functions encapsulate code that performs or executes an action using an API or other interface.
  • 语义功能可以与其他语义功能或原生功能相结合,并作为执行阶段彼此分层嵌套。
  • Semantic functions can be combined with other semantic or native functions and layered within one another as execution stages.
  • SK 可用于在语义服务层的 API 调用之上创建 GPT 接口,并将其作为聊天或代理接口插件公开。
  • SK can be used to create a GPT interface over the top of API calls in a semantic service layer and expose them as chat or agent interface plugins.
  • 语义服务代表了 LLM 和插件之间的交互,以及这些概念在创建高效 AI 代理中的实际应用。
  • Semantic services represent the interaction between LLMs and plugins, as well as the practical implementation of these concepts in creating efficient AI agents.

6 构建自主助手

6 Building autonomous assistants

本章内容

This chapter covers

  • 机器人和人工智能应用的行为树
  • Behavior trees for robotics and AI apps
  • GPT 助手游乐场以及创建助手和操作
  • GPT Assistants Playground and creating assistants and actions
  • 自主控制智能体行为树
  • Autonomous control of agentic behavior trees
  • 通过智能体行为树模拟对话式多智能体系统
  • Simulating conversational multi-agent systems via agentic behavior trees
  • 利用反向链接法为复杂系统创建行为树
  • Using back chaining to create behavior trees for complex systems

既然我们已经了解了行动如何扩展智能体的能力,接下来就可以看看行为树如何指导智能体系统。我们将首先了解行为树的基础知识,以及它们如何在游戏中控制机器人和人工智能。

Now that we’ve covered how actions extend the power/capabilities of agents, we can look at how behavior trees can guide agentic systems. We’ll start by understanding the basics of behavior trees and how they control robotics and AI in games.

我们将回到智能体行为,并探讨如何使用 GPT Assistants Playground 项目在 OpenAI Assistants 平台上实现智能体行为。接下来,我们将研究如何使用 OpenAI Assistants 构建自主智能体行为树 (ABT)。然后,我们将进一步了解自主智能体中控制和防护机制的必要性,以及如何使用控制屏障函数。

We’ll return to agentic actions and examine how actions can be implemented on the OpenAI Assistants platform using the GPT Assistants Playground project. From there, we’ll look at how to build an autonomous agentic behavior tree (ABT) using OpenAI assistants. Then, we’ll move on to understanding the need for controls and guardrails on autonomous agents and using control barrier functions.

在本章最后一节中,我们将探讨如何使用 AgentOps 平台来监控我们自主的行为驱动型智能体系统。这将是一个充满挑战且引人入胜的章节。让我们先进入下一节,该节将介绍行为树。

In the final section of the chapter, we’ll examine the use of the AgentOps platform to monitor our autonomous behavior-driven agentic systems. This will be an exciting chapter with several challenges. Let’s begin by jumping into the next section, which introduces behavior trees.

6.1行为树简介

6.1 Introducing behavior trees

行为树是一种由来已久的模式,用于控制游戏中的机器人和人工智能。Rodney A. Brooks 于 1986 年在其论文《移动机器人的鲁棒分层控制系统》中首次提出了这一概念。这为我们今天所使用的树状和节点结构模式奠定了基础,并在此基础上进行了扩展。

Behavior trees are a long-established pattern used to control robotics and AI in games. Rodney A. Brooks first introduced the concept in his “A Robust Layered Control System for a Mobile Robot” paper in 1986. This laid the groundwork for a pattern that expanded on using the tree and node structure we have today.

如果你玩过带有非玩家角色(NPC)的电脑游戏,或者与高级机器人系统互动过,那么你一定见过行为树的运作。图 6.1 展示了一个简单的行为树。这棵树包含了所有主要节点:选择器或回退节点、序列节点、动作节点和条件节点。

If you’ve ever played a computer game with nonplayer characters (NPCs) or interacted with advanced robotic systems, you’ve witnessed behavior trees at work. Figure 6.1 shows a simple behavior tree. The tree represents all the primary nodes: selector or fallback nodes, sequence nodes, action nodes, and condition nodes.

数字
图 6.1吃苹果或梨的简单行为树

表 6.1 描述了本书将要探讨的主要节点的功能和用途。还有其他节点和节点类型,甚至可以创建自定义节点,但现在,我们将重点关注表中列出的节点。

Table 6.1 describes the functions and purpose of the primary nodes we’ll explore in this book. There are other nodes and node types, and you can even create custom nodes, but for now, we’ll focus on those in the table.

表 6.1行为树中使用的主要节点
节点
目的
功能
类型
选择器(备用)

这个节点的工作原理是选择第一个成功完成的子节点。它通常被称为回退节点,因为它总是会回退到最后一个成功执行的节点。

该节点按顺序调用其子节点,并在第一个子节点成功时停止执行。当子节点成功时,它将返回成功;如果没有节点成功,则返回失败。

合成的

顺序

该节点按顺序执行其所有子节点,直到某个节点失败或所有子节点全部成功完成为止。

该节点会按顺序调用其所有子节点,无论调用成功还是失败。如果所有子节点都成功,则返回成功;如果只有一个子节点失败,则返回失败。

合成的

健康)状况

行为树不使用布尔逻辑,而是以成功或失败作为控制手段。条件为真时返回成功,否则返回假。

该节点根据条件返回成功或失败。

任务

行动

这里是行动发生的地方。

节点执行后,如果成功则返回成功,否则返回失败。

任务

装饰师

它们通过控制子节点的执行来工作。它们通常被称为条件语句,因为它们可以判断一个节点是否值得执行或执行是否安全。

节点控制子节点的执行。装饰器可以作为控制屏障函数,阻止或防止不必要的行为。

装饰师

平行线

该节点并行执行其所有子节点。成功或失败由一个阈值控制,该阈值表示成功所需的子节点数量。

该节点会按顺序执行其所有子节点,而不管这些子节点的状态如何。

合成的

表 6.1 中的主要节点足以应对众多用例。然而,理解行为树一开始可能会让人望而生畏。只有真正开始使用它们,你才能体会到它们内在的复杂性。在构建一些简单的行为树之前,我们将在下一节中更详细地探讨其执行过程。

The primary nodes in table 6.1 can provide enough functionality to handle numerous use cases. However, understanding behavior trees initially can be daunting. You won’t appreciate their underlying complexity until you start using them. Before we build some simple trees, we want to look at execution in more detail in the next section.

6.1.1理解行为树执行

6.1.1 Understanding behavior tree execution

理解行为树的执行方式对于设计和实现行为树至关重要。与计算机科学中的大多数概念不同,行为树以成功或失败为导向。当行为树中的一个节点执行时,它会返回成功或失败的结果;这甚至适用于条件节点和选择器节点。

Understanding how behavior trees execute is crucial to designing and implementing behavior trees. Unlike most concepts in computer science, behavior trees operate in terms of success and failure. When a node in a behavior tree executes, it will return either success or failure; this even applies to conditions and selector nodes.

行为树的执行顺序是从上到下、从左到右。图 6.2 展示了整个过程,以及节点执行成功或失败时的处理结果。在这个例子中,由行为树控制的 AI 有一个苹果,但没有梨。在第一个序列节点中,一个条件检查 AI 是否拥有苹果。由于 AI 没有苹果,它中止该序列并回退到选择器。选择器随后选择其下一个子节点(另一个序列),该序列检查 AI 是否拥有梨。由于 AI 拥有梨,因此 AI 吃掉了苹果。

Behavior trees execute from top to bottom and left to right. Figure 6.2 shows the process and what happens if a node fails or succeeds. In the example, the AI the tree controls has an apple but no pear. In the first sequence node, a condition checks if the AI has an apple. Because the AI doesn’t have an apple, it aborts the sequence and falls back to the selector. The selector then selects its next child node, another sequence, that checks if the AI has a pear, and because it does, the AI eats the apple.

数字
图 6.2简单行为树的执行过程

行为树能够控制人工智能系统在宏观和微观层面的执行方式。在机器人领域,行为树通常设计用于微观层面,其中每个动作或条件都是一个独立的小事件,例如检测到苹果。相反,行为树也可以控制更宏观的系统,例如游戏中的NPC,其中每个动作可能是一系列事件的组合,例如攻击玩家。

Behavior trees provide control over how an AI system will execute at a macro or micro level. Regarding robotics, behavior trees will typically be designed to operate at the micro level, where each action or condition is a small event, such as detecting the apple. Conversely, behavior trees can also control more macro systems, such as NPCs in games, where each action may be a combination of events, like attacking the player.

对于智能体系统而言,行为树支持在您选择的层级控制智能体或助手。我们将探讨在任务层级控制智能体,并在后续章节中探讨在规划层级控制智能体。毕竟,借助逻辑层模型(LLM)的强大功能,智能体可以构建自己的行为树。

For agentic systems, behavior trees support controlling an agent or assistant at your chosen level. We’ll explore controlling agents at the task and, in later chapters, the planning levels. After all, with the power of LLMs, agents can construct their own behavior tree.

当然,还有其他几种人工智能控制方式可以用来控制智能体系统。下一节将探讨这些不同的系统,并将它们与行为树进行比较。

Of course, several other forms of AI control could be used to control agentic systems. The next section will examine those different systems and compare them to behavior trees.

6.1.2确定行为树

6.1.2 Deciding on behavior trees

许多其他人工智能控制系统也各有优势,值得在控制智能体系统方面进行探索。它们可以展示行为树的优势,并为特定用例提供其他选择。行为树是一种优秀的模式,但并非唯一选择,了解其他模式也很有价值。

Numerous other AI control systems have benefits and are worth exploring in controlling agentic systems. They can demonstrate the benefits of behavior trees and provide other options for specific use cases. The behavior tree is an excellent pattern, but it isn’t the only one, and it’s worth learning about others.

表 6.2 列出了我们可用于控制人工智能系统的其他几种系统。表中每一项都描述了该方法的功能、不足之处以及其在智能体人工智能控制中的潜在应用。

Table 6.2 highlights several other systems we may consider for controlling AI systems. Each item in the table describes what the method does, its shortcomings, and its possible application to agentic AI control.

表 6.2其他人工智能控制系统的比较
控件名称
描述
缺点
控制智能体人工智能?
有限状态机 FSM)

有限状态机(FSM)使用由事件或条件触发的一组状态和转换来模拟人工智能。

随着复杂性的增加,有限状态机可能会变得难以管理。

有限状态机对于智能体来说并不实用,因为它们扩展性不好。

决策树b

决策树使用树状模型来表示决策及其可能的后果。

决策树在复杂场景下可能出现过拟合和泛化能力不足的问题。

决策树可以通过行为树进行调整和增强。

基于效用的系统b

效用函数根据当前情况评估并选择最佳行动方案。

这些系统需要精心设计效用功能,以平衡各项优先级。

这种模式可以应用于行为树中。

基于规则的系统

这组“如果……那么……”规则定义了人工智能的行为。

这些系统规则繁多,可能会变得臃肿不堪,从而导致潜在的冲突。

当与由 LLM 驱动的代理系统结合使用时,这些方法不太实用。

规划系统c

规划系统利用规划算法生成一系列行动方案,以实现特定目标。

这些系统计算成本高昂,并且需要丰富的领域知识。

正如我们将在后面的章节中看到的那样,智能体已经可以自行实现这些模式。

行为克隆c

行为克隆是指通过模仿专家示范来学习策略。

该系统可能难以推广到未曾见过的情况。

这可以融入行为树或特定任务中。

层级任务网络(HTN)d

HTN 将任务分解成更小、更易于管理的子任务,并按层次结构排列。

这些任务规模非常大,管理和设计起来都很复杂。

HTN(高层次网络)能够更好地组织和执行复杂任务。这种模式可用于更大型的智能体系统。

黑板系统b

这些系统采用协作式问题解决机制,不同子系统使用共享黑板进行协作。

这些系统难以实施,且子系统之间的通信难以管理。

智能体系统可以使用对话或群聊/讨论串来实现类似的模式。

遗传算法(GA)d

这些优化技术受到自然选择的启发,旨在通过进化来解决问题。

遗传算法计算量大,而且不一定总能找到最优解。

遗传算法具有潜力,甚至可以用来优化行为树。

a . 考虑复杂智能体系统时不实用;b

.存在于行为树中或易于整合;

c.通常应用于任务或动作/条件级别

;d.应用于智能体时,高级系统需要大量工作。

本书后续章节将探讨表 6.2 中讨论的一些模式。总的来说,许多模式都可以以行为树为基础进行增强或整合。虽然其他模式(例如有限状态机)可能对小型实验有所帮助,但它们缺乏行为树那样的可扩展性。

In later chapters of this book, we’ll investigate some of the patterns discussed in table 6.2. Overall, several patterns can be enhanced or incorporated using behavior trees as the base. While other patterns, such as FSMs, may be helpful for small experiments, they lack the scalability of behavior trees.

行为树作为人工智能控制系统具有诸多优势,包括可扩展性。以下列举了使用行为树的其他显著优势:

Behavior trees can provide several benefits as an AI control system, including scalability. The following list highlights other notable benefits of using behavior trees:

  • 模块化和可重用性——行为树提倡采用模块化方法来设计行为,使开发人员能够创建可重用的组件。行为树中的节点可以轻松地在树的不同部分甚至不同的项目中重用,从而提高可维护性并缩短开发时间。
  • Modularity and reusability—Behavior trees promote a modular approach to designing behaviors, allowing developers to create reusable components. Nodes in a behavior tree can be easily reused across different parts of the tree or even in different projects, enhancing maintainability and reducing development time.
  • 可扩展性——随着系统复杂性的增加,行为树比其他方法(例如有限状态机)更能优雅地处理新行为的添加。行为树允许任务的层级组织,从而更容易管理和理解大型行为集。
  • Scalability—As systems grow in complexity, behavior trees handle the addition of new behaviors more gracefully than other methods, such as FSMs. Behavior trees allow for the hierarchical organization of tasks, making it easier to manage and understand large behavior sets.
  • 灵活性和可扩展性——行为树提供了一个灵活的框架,可以在不大幅改变现有结构的情况下添加新的节点(动作、条件、装饰器)。这种可扩展性使得引入新行为或修改现有行为以适应新需求变得非常简单。
  • Flexibility and extensibility—Behavior trees offer a flexible framework where new nodes (actions, conditions, decorators) can be added without drastically altering the existing structure. This extensibility makes it straightforward to introduce new behaviors or modify existing ones to adapt to new requirements.
  • 调试和可视化——行为树以清晰直观的方式呈现行为,这有利于调试和理解决策过程。支持行为树的工具通常包含图形编辑器,使开发人员能够可视化和调试树结构,从而更轻松地识别和修复问题。
  • Debugging and visualization—Behavior trees provide a clear and intuitive visual representation of behaviors, which is beneficial for debugging and understanding the decision-making process. Tools that support behavior trees often include graphical editors that allow developers to visualize and debug the tree structure, making it easier to identify and fix problems.
  • 决策逻辑解耦——行为树将决策逻辑和执行逻辑分离,从而清晰地区分了高层策略和底层操作。这种解耦简化了设计,并允许更直接地修改和测试特定行为部分,而不会影响整个系统。
  • Decoupling of decision logic—Behavior trees separate the decision-making and execution logic, promoting a clear distinction between high-level strategy and low-level actions. This decoupling simplifies the design and allows for more straightforward modifications and testing of specific behavior parts without affecting the entire system.

既然我们已经充分论证了行为树的优势,现在就应该考虑如何在代码中实现它们。下一节,我们将学习如何使用 Python 代码构建一个简单的行为树。

Having made a strong case for behavior trees, we should now consider how to implement them in code. In the next section, we look at how to build a simple behavior tree, using Python code.

6.1.3使用 Python 和 py_trees 运行行为树

6.1.3 Running behavior trees with Python and py_trees

由于行为树技术已经存在很长时间,并且被集成到许多技术中,因此创建示例演示非常简单。当然,最简单的方法是询问 ChatGPT 或您常用的 AI 聊天工具。清单 6.1 展示了使用提示生成代码示例并将图 6.1 作为示例树提交的结果。最终代码需要修正一些简单的命名和参数错误。

Because behavior trees have been around for so long and have been incorporated into many technologies, creating a sample demonstration is very simple. Of course, the easiest way is to ask ChatGPT or your favorite AI chat tool. Listing 6.1 shows the result of using a prompt to generate the code sample and submitting figure 6.1 as the example tree. The final code had to be corrected for simple naming and parameter errors.

清单 6.1 first_btree.py
导入 py_trees

class HasApple(py_trees.behaviour.Behaviour):     #1
    def __init__(self, name):
        super(HasApple, self).__init__(name)

    def update(self):        
        如果为真:  
            返回 py_trees.common.Status.SUCCESS
        别的:
            返回 py_trees.common.Status.FAILURE
其他课程已省略……

has_apple = HasApple(name="Has apple")     #2
eat_apple = EatApple(name="吃苹果") #2
sequence_1 = py_trees.composites.Sequence(name="Sequence 1", memory=True)
sequence_1.add_children([has_apple, eat_apple])                             #3

has_pear = HasPear(name="有梨")        #4
eat_pear = EatPear(name="吃梨") #4
sequence_2 = py_trees.composites.Sequence(name="Sequence 2", memory=True)
sequence_2.add_children([has_pear, eat_pear]) #3                

root = py_trees.composites.Selector(name="Selector", memory=True)
root.add_children([sequence_1, sequence_2]) #3                       

behavior_tree = py_trees.trees.BehaviourTree(root)    #5

py_trees.logging.level = py_trees.logging.Level.DEBUG   
for i in range(1, 4):                                                     #6
    print("\n------------------ 勾选 {0} ------------------".format(i))
    behavior_tree.tick() #6

### 输出开始
------------------ 勾选 1 ------------------
[调试] 选择器:Selector.tick()
[调试] 选择器:Selector.tick() [!RUNNING->重置当前子项]
[调试] 序列 1:Sequence.tick()
[调试] 有苹果:HasApple.tick()
[调试] Has apple : HasApple.stop(Status.INVALID->Status.SUCCESS)
[调试] 吃苹果:EatApple.tick()
吃苹果
[调试] 吃苹果:EatApple.stop(Status.INVALID->Status.SUCCESS)
[调试] 序列 1:Sequence.stop()[Status.INVALID->Status.SUCCESS]
import py_trees

class HasApple(py_trees.behaviour.Behaviour):      #1
    def __init__(self, name):
        super(HasApple, self).__init__(name)

    def update(self):        
        if True:  
            return py_trees.common.Status.SUCCESS
        else:
            return py_trees.common.Status.FAILURE
# Other classes omitted…

has_apple = HasApple(name="Has apple")      #2
eat_apple = EatApple(name="Eat apple")      #2
sequence_1 = py_trees.composites.Sequence(name="Sequence 1", memory=True)
sequence_1.add_children([has_apple, eat_apple])                              #3

has_pear = HasPear(name="Has pear")         #4
eat_pear = EatPear(name="Eat pear")         #4
sequence_2 = py_trees.composites.Sequence(name="Sequence 2", memory=True)
sequence_2.add_children([has_pear, eat_pear])               #3                

root = py_trees.composites.Selector(name="Selector", memory=True)
root.add_children([sequence_1, sequence_2])          #3                       

behavior_tree = py_trees.trees.BehaviourTree(root)     #5

py_trees.logging.level = py_trees.logging.Level.DEBUG   
for i in range(1, 4):                                                      #6
    print("\n------------------ Tick {0} ------------------".format(i))
    behavior_tree.tick()                                                  #6

### Start of output
------------------ Tick 1 ------------------
[DEBUG] Selector             : Selector.tick()
[DEBUG] Selector             : Selector.tick() [!RUNNING->reset current_child]
[DEBUG] Sequence 1           : Sequence.tick()
[DEBUG] Has apple            : HasApple.tick()
[DEBUG] Has apple            : HasApple.stop(Status.INVALID->Status.SUCCESS)
[DEBUG] Eat apple            : EatApple.tick()
Eating apple
[DEBUG] Eat apple            : EatApple.stop(Status.INVALID->Status.SUCCESS)
[DEBUG] Sequence 1           : Sequence.stop()[Status.INVALID->Status.SUCCESS]
#1 创建一个类来实现动作或条件

#2 创建动作节点和条件节点

#3 将节点添加到各自的父节点

#4 创建动作节点和条件节点

#5 创建整个行为树

#6 执行行为树中的一个步骤/事件

清单 6.1 中的代码表示图 6.1 中的行为树。您可以直接运行此代码,也可以修改条件返回值,然后再次运行该行为树。您还可以通过从根选择器中移除一个序列节点来更改行为树。

The code in listing 6.1 represents the behavior tree in figure 6.1. You can run this code as is or alter what the conditions return and then run the tree again. You can also change the behavior tree by removing one of the sequence nodes from the root selector.

现在我们已经对行为树有了基本的了解,接下来就可以开始使用智能体/助手了。在此之前,我们将介绍一个帮助我们使用 OpenAI 助手的工具。这个工具将帮助我们把最初的智能体行为树(ABT)集成到 OpenAI 助手上。

Now that we have a basic understanding of behavior trees, we can move on to working with agents/assistants. Before doing that, we’ll look at a tool to help us work with OpenAI Assistants. This tool will help us wrap our first ABTs around OpenAI Assistants.

6.2探索 GPT 助手游乐场

6.2 Exploring the GPT Assistants Playground

为了编写本书,我们创建了多个 GitHub 项目,分别探讨构建智能体和助手的各个方面。其中一个项目名为 GPT Assistants Playground,它使用 Gradio 构建界面,模仿了 OpenAI Assistants Playground,但添加了一些额外功能。

For the development of this book, several GitHub projects were created to address various aspects of building agents and assistants. One such project, the GPT Assistants Playground, is built using Gradio for the interface that mimics the OpenAI Assistants Playground but with several extras added.

Playground 项目旨在作为教学和演示辅助工具。项目内部的 Python 代码使用 OpenAI Assistants API 创建聊天界面和代理系统,用于构建和驱动各种助手。此外,它还提供了一系列丰富的助手操作示例供您使用,您也可以轻松添加自己的操作。

The Playground project was developed as both a teaching and demonstration aid. Inside the project, the Python code uses the OpenAI Assistants API to create a chat interface and an agentic system to build and power assistants. There is also a comprehensive collection of actions assistants you can use, and you can easily add your own actions.

6.2.1安装和运行 Playground

6.2.1 Installing and running the Playground

以下代码展示了如何从终端安装并运行 Playground 项目。目前没有 PyPI 包可供安装。

The following listing shows installing and running the Playground project from the terminal. There is currently no PyPI package to install.

清单 6.2安装 GPT Assistants Playground
# 切换到工作文件夹并创建一个新的 Python 虚拟环境
git 克隆
 https://github.com/cxbxmxcx/GPTAssistantsPlayground    #1
cd GPTAssistantsPlayground     #2
pip install -r requirements.txt     #3
# change to a working folder and create a new Python virtual environment
git clone 
 https://github.com/cxbxmxcx/GPTAssistantsPlayground     #1
cd GPTAssistantsPlayground      #2
pip install -r requirements.txt      #3
#1 从 GitHub 拉取源代码

#2 将目录切换到项目源代码文件夹

#3 安装依赖项

您可以从终端或使用 Visual Studio Code (VS Code) 运行应用程序,后者可提供更多控制选项。运行应用程序之前,您需要通过命令行或创建文件.env(我们之前已多次演示过)来设置 OpenAI API 密钥。清单 6.3 展示了在 Linux/Mac 或 Git Bash shell(推荐使用 Windows)上设置环境变量并运行应用程序的示例。

You can run the application from the terminal or using Visual Studio Code (VS Code), with the latter giving you more control. Before running the application, you need to set your OpenAI API key through the command line or by creating an .env file, as we’ve done a few times already. Listing 6.3 shows an example of setting the environment variable on Linux/Mac or the Git Bash shell (Windows recommended) and running the application.

清单 6.3运行 GPT 助手 Playground
export OPENAI_API_KEY="你的api密钥"     #1
python main.py    #2
export OPENAI_API_KEY="your-api-key"      #1
python main.py     #2
#1 将 API 密钥设置为环境变量

#2 从终端或通过 VS Code 运行应用程序

打开浏览器访问显示的 URL(通常为http://127.0.0.1:7860)或终端中提到的内容。您将看到一个类似于图 6.3 所示的界面。如果您已经定义了 OpenAI 助手,您将在“选择助手”下拉菜单中看到它们。

Open your browser to the URL displayed (typically http://127.0.0.1:7860) or what is mentioned in the terminal. You’ll see an interface similar to that shown in figure 6.3. If you’ve already defined the OpenAI Assistants, you’ll see them in the Select Assistant dropdown.

数字
图 6.3用于学习数学的 GPT Assistants Playground 界面

如果您从未定义过助手,您可以创建一个,并选择所需的各种选项和指令。如果您访问过 OpenAI Playground,您应该已经体验过类似的界面。

If you’ve never defined an assistant, you can create one and choose the various options and instructions you need. If you’ve visited the OpenAI Playground, you’ve already experienced a similar interface.

接下来的几节将更详细地介绍这些特性。下一节我们将首先探讨如何使用和消费操作。

Each of these features is covered in more detail over the next few sections. We’ll start with a look at using and consuming actions in the next section.

6.2.2使用和构建自定义操作

6.2.2 Using and building custom actions

动作和工具是赋予智能体和助手功能的基石。如果没有工具,智能体就只是毫无功能的聊天机器人。正如我们在第三章中所看到的,OpenAI 平台在构建许多工具模式方面处于领先地位。

Actions and tools are the building blocks that empower agents and assistants. Without access to tools, agents are functionless chatbots. The OpenAI platform is a leader in establishing many of the patterns for tools, as we saw in chapter 3.

Playground 提供了多种自定义操作,可以通过界面将其附加到助手上。在接下来的练习中,我们将构建一个简单的助手,并附加几个自定义操作,看看有哪些功能。

The Playground provides several custom actions that can be attached to assistants through the interface. In this next exercise, we’ll build a simple assistant and attach a couple of custom actions to see what is possible.

图 6.4 显示了展开的“操作”折叠面板,其中显示了许多可用的自定义操作。从终端或调试器运行 Playground,并创建一个新的助手。然后,选择图中所示的操作。选择完操作后,滚动到底部,然后单击“添加助手”以添加助手。助手必须先创建才能使用。

Figure 6.4 shows the expanded Actions accordion, which displays many available custom actions. Run the Playground from the terminal or debugger, and create a new assistant. Then, select the actions shown in the figure. After you’re done selecting the actions, scroll to the bottom, and click Add Assistant to add the assistant. Assistants need to be created before they can be used.

数字
图 6.4在界面中选择和使用自定义操作

创建助手后,您可以让它列出所有可用的助手。列出助手后,您还会获得调用助手所需的 ID。您还可以调用其他助手,并让他们完成其专业领域的任务。

After you create the assistant, you can ask it to list all available assistants. Listing the assistants also gives you the IDs required to call the assistant. You can also call other assistants and ask them to complete tasks in their area of specialization.

添加自定义操作非常简单,只需将代码添加到文件并将其放入正确的文件夹即可。playground/assistant_actions从主项目文件夹打开该文件夹,您会看到几个定义各种操作的文件。file_actions.py在 VS Code 中打开该文件,如清单 6.4 所示。

Adding your custom actions is as simple as adding code to a file and dropping it in the right folder. Open the playground/assistant_actions folder from the main project folder, and you’ll see several files that define the various actions. Open the file_actions.py file in VS Code, as shown in listing 6.4.

清单 6.4 playground/assistant_actions/file_actions.py
导入操作系统

from playground.actions_manager import agent_action

输出文件夹 = "assistant_outputs"


@agent_action    #1
def save_file(filename, content):     #2
    """
    将内容保存到文件。     #3

    :param filename: 文件名(包括扩展名)。
    :param content: 要保存到文件中的内容。
    """
    file_path = os.path.join(OUTPUT_FOLDER, filename)
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(内容)
    print(f"文件'{filename}'已成功保存。")     #4
import os

from playground.actions_manager import agent_action

OUTPUT_FOLDER = "assistant_outputs"


@agent_action     #1
def save_file(filename, content):      #2
    """
    Save content to a file.      #3

    :param filename: The name of the file including extension.
    :param content: The content to save in the file.
    """
    file_path = os.path.join(OUTPUT_FOLDER, filename)
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(content)
    print(f"File '{filename}' saved successfully.")      #4
#1 此装饰器会自动将函数添加为一个操作。

#2 为函数取一个清晰且与其用途相符的名称。

#3 描述是助手用来判断函数的信息,因此请务必详细记录。

#4 通常会返回一条消息,说明成功或失败。

您可以通过将文件放入assistant_actions文件夹并使用装饰器对其进行装饰,来添加任何您想要的自定义操作agent_action。只需确保为函数指定一个合适的名称,并编写高质量的文档说明其使用方法即可。Playground 启动时,它会加载文件夹中所有已正确装饰且具有描述/文档的操作。

You can add any custom action you want by placing the file in the assistant_actions folder and decorating it with the agent_action decorator. Just make sure to give the function a good name and enter quality documentation for how the function should be used. When the Playground starts up, it loads all the actions in the folder that are decorated correctly and have descriptions/documentation.

就这么简单。您可以根据需要添加多个自定义操作。下一节,我们将介绍一种特殊的自定义操作,它允许助手在本地运行代码。

It’s that simple. You can add several custom actions as needed. In the next section, we’ll look at a special custom action that allows the assistant to run code locally.

6.2.3安装助手数据库

6.2.3 Installing the assistants database

要运行本章中的几个示例,您需要安装助手数据库。幸运的是,这可以通过界面轻松完成,只需询问代理即可。接下来的说明详细介绍了助手的安装过程,这些说明直接取自 GPT Assistants Playground 的 README 文件。您可以安装位于assistants.dbSQLite 数据库中的几个演示助手:

To run several of the examples in this chapter, you’ll need to install the assistants database. Fortunately, this can be easily done through the interface and just by asking agents. The upcoming instructions detail the process for installing the assistants and are taken directly from the GPT Assistants Playground README. You can install several of the demo assistants located in the assistants.db SQLite database:

  1. 创建新的助手,或使用现有的助手。
  2. Create a new assistant, or use an existing assistant.
  3. 向助手发出create_manager_assistant操作指令(可在“操作”部分找到)。
  4. Give the assistant the create_manager_assistant action (found under the Actions section).
  5. 请助理创建经理助理(即,“请创建经理助理”),并确保将助理命名为“经理助理”。
  6. Ask the assistant to create the manager assistant (i.e., “please create the manager assistant”), and be sure to name the assistant “Manager Assistant.”
  7. 刷新浏览器以重新加载助手选择器。
  8. Refresh your browser to reload the assistants selector.
  9. 选择新的管理助手。该助手包含从数据库中安装助手所需的指令和操作assistants.db
  10. Select the new Manager Assistant. This assistant has the instructions and actions that will allow it to install assistants from the assistants.db database.
  11. 与经理助理交谈,获取要安装的助手列表,或者直接要求经理助理安装所有可用的助手。
  12. Talk to the Manager Assistant to give you a list of assistants to install, or just ask the Manager Assistant to install all available assistants.

6.2.4让助手在本地运行代码

6.2.4 Getting an assistant to run code locally

让代理和助手生成并运行可执行代码具有强大的功能。与代码解释器不同,本地运行代码提供了大量快速迭代和调优的机会。我们之前在 AutoGen 中就看到了这一点,代理可以不断运行代码,直到它按预期工作为止。

Getting agents and assistants to generate and run executable code has a lot of power. Unlike the Code Interpreter, running code locally provides numerous opportunities to iterate and tune quickly. We saw this earlier with AutoGen, where the agents could keep running the code until it worked as expected.

在 Playground 中,选择自定义操作非常简单run_code,如图 6.5 所示。您还需要选择该run_shell_command操作,因为它允许助手访问pip install任何所需的模块。

In the Playground, it’s a simple matter to select the custom action run_code, as shown in figure 6.5. You’ll also want to choose the run_shell_command action because it allows the assistant to pip install any required modules.

数字
图 6.5选择自定义操作以供助手运行 Python 代码

现在您可以让助手生成并运行代码,以确保其正常运行。您可以尝试添加自定义操作,然后让助手生成并运行代码,如图 6.6 所示。如果代码未按预期运行,请告知助手您遇到的问题。

You can now ask an assistant to generate and run the code to be sure it works on your behalf. Try this out by adding the custom actions and asking the assistant to generate and run code, as shown in figure 6.6. If the code doesn’t work as expected, tell the assistant what problems you encountered.

数字
图 6.6让助手生成并运行 Python 代码

同样,在 Playground 中运行的 Python 代码会在项目子文件夹中创建一个新的虚拟环境。如果您不运行任何操作系统级代码或底层代码,此系统运行良好。如果您需要更强大的功能,AutoGen 是一个不错的选择,它使用 Docker 容器来运行隔离的代码。

Again, the Python code running in the Playground creates a new virtual environment in a project subfolder. This system works well if you’re not running any operating system–level code or low-level code. If you need something more robust, a good option is AutoGen, which uses Docker containers to run isolated code.

添加操作来运行代码或其他任务可能会让助手感觉像个黑匣子。幸运的是,OpenAI Assistants API 允许你接收事件并查看助手在后台的运行情况。下一节我们将看到它的具体实现方式。

Adding actions to run code or other tasks can make assistants feel like a black box. Fortunately, the OpenAI Assistants API allows you to consume events and see what the assistant is doing behind the scenes. In the next section, we’ll see what this looks like.

6.2.5通过日志调查助手流程

6.2.5 Investigating the assistant process through logs

OpenAI 在 Assistants API 中添加了一项功能,允许您监听通过工具/操作使用而产生的事件和操作链。此功能已集成到 Playground 中,可在助手调用另一个助手时捕获操作和工具使用情况。

OpenAI added a feature into the Assistants API that allows you to listen to events and actions chained through tool/action use. This feature has been integrated into the Playground, capturing action and tool use when an assistant calls another assistant.

我们可以尝试让助手使用某个工具,然后打开日志。一个很好的例子是,您可以让助手使用代码解释器工具,然后让它绘制一个方程式。图 6.7 展示了这个练习的示例。

We can try this by asking an assistant to use a tool and then open the log. A great example of how you can do this is by giving an assistant the Code Interpreter tool and then asking it to plot an equation. Figure 6.7 shows an example of this exercise.

数字
图 6.7正在捕获的内部助理日志

通常情况下,启用“辅助代码解释器”工具后,您不会看到任何代码生成或执行过程。此功能允许您实时查看辅助程序使用的所有工具和操作。它不仅是出色的诊断工具,还能让您更深入地了解 LLM 的功能。

Usually, when the Assistant Code Interpreter tool is enabled, you don’t see any code generation or execution. This feature allows you to see all tools and actions used by the assistant as they happen. Not only is it an excellent tool for diagnostics, but it also provides additional insights into the functions of LLMs.

由于代码量庞大,而且很可能会经历多次修改,所以我们还没有对代码进行全面审查。即便如此,如果您计划使用 Assistants API,这个项目是一个很好的起点。有了 Playground,我们可以在下一节继续探索 ABT。

We haven’t reviewed the code to do all this because it’s extensive and will likely undergo several changes. That being said, if you plan on working with the Assistants API, this project is a good place to start. With the Playground introduced, we can continue our journey into ABTs in the next section.

6.3引入智能体行为树

6.3 Introducing agentic behavior trees

智能体行为树(ABT)在助手和智能体系统中实现行为树。ABT 与常规行为树的主要区别在于,前者使用提示来指导动作和条件。由于提示可能会返回大量随机结果,我们也可以将这类树称为随机行为树,这类树确实存在。为简便起见,我们将用于控制智能体的行为树统称为智能体行为树。

Agentic behavior trees (ABTs) implement behavior trees on assistant and agent systems. The key difference between regular behavior trees and ABTs is that they use prompts to direct actions and conditions. Because prompts may return a high occurrence of random results, we could also name these trees stochastic behavior trees, which do exist. For simplicity, we’ll differentiate behavior trees used to control agents, referring to them as agentic.

接下来,我们将进行一个创建 ABT 的练习。最终的树状图将用 Python 编写,但需要设置和配置各种助手。我们将介绍如何使用助手本身来管理它们。

Next, we’ll undertake an exercise to create an ABT. The finished tree will be written in Python but will require the setup and configuration of various assistants. We’ll cover how to manage assistants using the assistants themselves.

6.3.1用助理管理助理

6.3.1 Managing assistants with assistants

幸运的是,Playground 可以帮助我们快速管理和创建助手。我们将首先安装管理助手,然后再安装预定义的助手。让我们按照以下步骤开始安装管理助手:

Fortunately, the Playground can help us quickly manage and create the assistants. We’ll first install the Manager Assistant, followed by installing the predefined assistants. let’s get started with installing the Manager Assistant using the following steps:

  1. 在浏览器中打开 Playground,创建一个新的简单助手或使用现有助手。如果需要创建新助手,请先创建,然后选择它。
  2. Open Playground in your browser, and create a new simple assistant or use an existing assistant. If you need a new assistant, create it and then select it.
  3. 选中助手后,打开“操作”菜单,然后选择create_ manager_assistant操作。无需保存;界面会自动更新助手。
  4. With the assistant selected, open the Actions accordion, and select the create_ manager_assistant action. You don’t need to save; the interface will update the assistant automatically.
  5. 现在,在聊天界面中,向助手发出以下提示:“请创建经理助手。”
  6. Now, in the chat interface, prompt the assistant with the following: “Please create the manager assistant.”
  7. 几秒钟后,助手会提示完成。刷新浏览器,确认管理助手是否可用。如果由于某种原因没有显示新助手,请尝试重启 Gradio 应用。
  8. After a few seconds, the assistant will say it’s done. Refresh your browser, and confirm that the Manager Assistant is now available. If, for some reason, the new assistant isn’t shown, try restarting the Gradio app itself.

经理助理就像一个拥有所有权限的管理员。使用经理助理时,请务必具体说明您的请求。激活经理助理后,您现在可以按照以下步骤安装本书中使用的其他助理:

The Manager Assistant is like an admin that has access to everything. When engaging the Manager Assistant, be sure to be specific about your requests. With the Manager Assistant active, you can now install new assistants used in the book using the following steps:

  1. 选择管理助手。如果您修改过管理助手,可以随时删除并重新安装。虽然可以安装多个管理助手,但不建议这样做。
  2. Select the Manager Assistant. If you’ve modified the Manager Assistant, you can delete it and reinstall it anytime. Although it’s possible to have multiple Manager Assistants, it’s not recommended.
  3. 向经理助理询问可以安装哪些助手,请在聊天界面输入以下内容:
  4. Ask the Manager Assistant what assistants can be installed by typing the following in the chat interface:
请列出所有可安装的助手。
Please list all the installable assistants.
  1. 3.当你要求管理助理安装某个助理时,请明确指出你想安装的是哪个助理:
  2. 3. Identify which assistant you want installed when you ask the Manager Assistant to install it:
请安装Python编码助手。
Please install the Python Coding Assistant.

您可以使用 Playground 管理和安装任何可用的助手。您还可以让管理助手将所有助手的定义保存为 JSON 格式:

You can manage and install any available assistants using the Playground. You can also ask the Manager Assistant to save the definitions of all your assistants as JSON:

请将所有助手信息以 JSON 格式保存到名为 assistants.json 的文件中。
Please save all the assistants as JSON to a file called assistants.json.

经理助手可以执行所有操作,但这些操作应被视为独特且谨慎使用。在设计助手时,最好使其目标明确,并将操作限制在所需范围内。这不仅可以避免让AI做出过多决策,还可以避免因幻觉而导致的意外或错误。

The Manager Assistant can access all actions, which should be considered unique and used sparingly. When crafting assistants, it’s best to keep them goal specific and limit the actions to just what they need. This not only avoids giving the AI too many decisions but also avoids accidents or mistakes caused by hallucinations.

在本章剩余的练习中,您可能需要安装所需的助手。或者,您可以让管理员助手安装所有可用的助手。无论哪种方式,我们都将在下一节中介绍如何创建带有助手的 ABT。

As we go through the remaining exercises in this chapter, you’ll likely need to install the required assistants. Alternatively, you can ask the Manager Assistant to install all available assistants. Either way, we look at creating an ABT with assistants in the next section.

6.3.2构建编码挑战 ABT

6.3.2 Building a coding challenge ABT

编码挑战为测试和评估智能体和助手系统提供了一个良好的基准。挑战和基准测试可以量化智能体或智能体系统的运行性能。我们在第四章中已经将编码挑战应用于多平台智能体,例如 AutoGen 和 CrewAI。

Coding challenges provide a good baseline for testing and evaluating agent and assistant systems. Challenges and benchmarks can quantify how well an agent or agentic system operates. We already applied coding challenges to multi-platform agents in chapter 4 with AutoGen and CrewAI.

这次的编程挑战,我们将更进一步,选择 Edabit 网站 ( https://edabit.com ) 上的 Python 编程挑战,这些挑战的难度从入门到专家级不等。我们将选择专家级挑战,因为 GPT-4o 和其他模型都是优秀的程序员。请查看下一个挑战列表,并思考你会如何解决它。

For this coding challenge, we’re going a little further and looking at Python coding challenges from the Edabit site (https://edabit.com), which range in complexity from beginner to expert. We’ll stick with the expert code challenges because GPT-4o and other models are excellent coders. Look at the challenge in the next listing, and think about how you would solve it.

清单 6.5 Edabit 挑战:种草
AniXDownLoe 的《种草》

    你将获得一个表示域 g 的矩阵。
以及两个数字 x、y 坐标。

    矩阵中可能存在三种类型的字符:

        x 代表一块石头。
        o 代表一片泥土空间。
        + 代表一片草地。

    你需要模拟草从位置 (x, y) 生长出来。
    草可以向四个方向(上、左、右、下)生长。
    草只能在泥土地面上生长,不能越过石头。

    返回模拟矩阵。
    示例

    模拟草地([
    "xxxxxxx",
    "xooooox,
    "xxxxoox"
    "xoooxxx"
    "xxxxxxx"
    ], 1, 1)  [
    "xxxxxxx",
    "x+++++x",
    "xxxx++x"
    "xoooxxx"
    "xxxxxxx"
    ]

    笔记

    周边总会有岩石
Plant the Grass by AniXDownLoe

    You will be given a matrix representing a field g 
and two numbers x, y coordinate.

    There are three types of possible characters in the matrix:

        x representing a rock.
        o representing a dirt space.
        + representing a grassed space.

    You have to simulate grass growing from the position (x, y). 
    Grass can grow in all four directions (up, left, right, down). 
    Grass can only grow on dirt spaces and can't go past rocks.

    Return the simulated matrix.
    Examples

    simulate_grass([
    "xxxxxxx",
    "xooooox",
    "xxxxoox"
    "xoooxxx"
    "xxxxxxx"
    ], 1, 1)  [
    "xxxxxxx",
    "x+++++x",
    "xxxx++x"
    "xoooxxx"
    "xxxxxxx"
    ]

    Notes

    There will always be rocks on the perimeter

你可以使用任何你想要的挑战或编程练习,但以下几点需要考虑:

You can use any challenge or coding exercise you want, but here are a few things to consider:

  • 挑战应该可以通过可量化的断言(通过/失败)进行测试。
  • The challenge should be testable with quantifiable assertions (pass/fail).
  • 在请求游戏、构建网站或使用其他界面时,请避免打开窗口。未来某个时候,我们将能够测试完整的界面,但目前只能输出文本。
  • Avoid opening windows when asking for a game, building a website, or using another interface. At some point, testing full interfaces will be possible, but for now, it’s just text output.
  • 至少在初期,要避免设置持续时间过长的挑战。一开始,挑战内容要简洁明了,持续时间也要短。
  • Avoid long-running challenges, at least initially. Start by keeping the challenges concise and short lived.

除了挑战之外,您还需要一套测试用例或断言来验证解决方案的有效性。在 Edabit 上,挑战通常会提供一套完整的测试用例。以下列表展示了挑战附带的额外测试用例。

Along with any challenge, you’ll also want a set of tests or assertions to confirm the solution works. On Edabit, a challenge typically provides a comprehensive set of tests. The following listing shows the additional tests provided with the challenge.

清单 6.6种植草坪测试
Test.assert_equals(simulate_grass(
["xxxxxxx","xooooox","xxxxoox","xoooxxx","xxxxxxx"],
 1, 1)
["xxxxxxx","x+++++x","xxxx++x","xoooxxx","xxxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxxx","xoxooox","xxoooox","xooxxxx"
"xoxooox","xoxooox","xxxxxxx"],
 2, 3), ["xxxxxxx","xox+++x","xx++​​++x","x++xxxx","
"x+xooox","x+xooox","xxxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxx","xoxoox","xxooox","xoooox","xoooox","xxxxxx"],
1, 1)
["xxxxxx","x+xoox","xxooox","xoooox","xoooox","xxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxx","xooox","xooox","xooox","xxxxx"],
1, 1)
["xxxxx","x+++x","x+++x","x+++x","xxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxx","xxxxox","xxooox","xoooxx","xooxxx","
"xooxxx","xxooox","xxxoxx","xxxxxx"],
4, 1)
["xxxxxx","xxxx+x","xx++​​+x","x+++xx","x++xxx","
"x++xxx","xx++​​+x","xxx+xx","xxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxxxxxxx", "xoxooooooox", "xoxoxxxxxox",
“xoxoxoooxox”,“xoxoxoxoxox”,“xoxoxoxoxox”,
"xoxoxxxoxox", "xoxoooooxox", "xoxxxxxxxox",
"xooooooooox", "xxxxxxxxxxx"], 1, 1),
["xxxxxxxxxxx", "x+x+++++++x", "x+x+xxxxx+x",
"x+x+x+++x+x", "x+x+x+x+x+x", "x+x+x+x+x+x",
"x+x+xxx+x+x", "x+x+++++x+x", "x+xxxxxxx+x",
"x+++++++++x", "xxxxxxxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxxx","xooooox","xxxxoox","xoooxxx","xxxxxxx"],
 1, 1), 
["xxxxxxx","x+++++x","xxxx++x","xoooxxx","xxxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxxx","xoxooox","xxoooox","xooxxxx",
"xoxooox","xoxooox","xxxxxxx"],
 2, 3), ["xxxxxxx","xox+++x","xx++++x","x++xxxx",
"x+xooox","x+xooox","xxxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxx","xoxoox","xxooox","xoooox","xoooox","xxxxxx"], 
1, 1), 
["xxxxxx","x+xoox","xxooox","xoooox","xoooox","xxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxx","xooox","xooox","xooox","xxxxx"], 
1, 1),
["xxxxx","x+++x","x+++x","x+++x","xxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxx","xxxxox","xxooox","xoooxx","xooxxx",
"xooxxx","xxooox","xxxoxx","xxxxxx"], 
4, 1),
["xxxxxx","xxxx+x","xx+++x","x+++xx","x++xxx",
"x++xxx","xx+++x","xxx+xx","xxxxxx"])
    Test.assert_equals(simulate_grass(
["xxxxxxxxxxx", "xoxooooooox", "xoxoxxxxxox", 
"xoxoxoooxox", "xoxoxoxoxox", "xoxoxoxoxox", 
"xoxoxxxoxox", "xoxoooooxox", "xoxxxxxxxox", 
"xooooooooox", "xxxxxxxxxxx"], 1, 1), 
["xxxxxxxxxxx", "x+x+++++++x", "x+x+xxxxx+x", 
"x+x+x+++x+x", "x+x+x+x+x+x", "x+x+x+x+x+x", 
"x+x+xxx+x+x", "x+x+++++x+x", "x+xxxxxxx+x", 
"x+++++++++x", "xxxxxxxxxxx"])

这些测试将作为两步验证流程的一部分进行,以确认解决方案有效。我们还将按照预先编写的测试和挑战进行操作,以进一步测试人工智能。

The tests will be run as part of a two-step verification to confirm that the solution works. We’ll also use the tests and challenges as written, which will further test the AI.

图 6.8 展示了一个简单的行为树的构成,该行为树将用于解决各种编程挑战。你会注意到,这个 ABT 使用了不同的助手来处理动作和条件。在第一步中,Python 编码助手(称为 Hacker)生成一个解决方案,然后由编码挑战评审员(称为 Judge)进行审查,评审员生成一个改进后的解决方案,最后由另一个 Python 编码助手(称为 Verifier)进行验证。

Figure 6.8 shows the makeup of a straightforward behavior tree that will be used to solve various programming challenges. You’ll notice that this ABT uses a different assistant for the actions and conditions. For the first step, the Python coding assistant (called the Hacker) generates a solution that is then reviewed by the coding challenge Judge (called the Judge), which produces a refined solution that is verified by a different Python coding assistant (called the Verifier).

数字
图 6.8编码挑战的 ABT

图 6.8 还展示了每个代理如何在各自的线程中进行对话。助手使用消息线程,类似于 Slack 或 Discord 频道,在同一线程上对话的所有助手都能看到所有消息。对于此 ABT,我们为黑客和裁判保留了一个主对话线程用于共享消息,而验证者则在另一个单独的消息线程中工作。将验证者置于独立的线程中,可以使其免受解决方案求解过程中的干扰。

Figure 6.8 also shows how each agent converses on which thread. Assistants use message threads, similar to a Slack or Discord channel, where all assistants conversing on a thread will see all messages. For this ABT, we keep one main conversation thread for the Hacker and Judge to share messages, while the Verifier works on a separate message thread. Keeping the Verifier on its own thread isolates it from the noise of the solution-solving efforts.

现在,用代码构建 ABT 就是将py_trees软件包和 Playground API 函数结合起来。清单 6.7 展示了一段代码,它创建了每个带有助手的动作/条件节点,并向它们发送了指令。

Now, building the ABT in code is a matter of combining the py_trees package and the Playground API functions. Listing 6.7 shows an excerpt of code that creates each of the action/condition nodes with the assistants and gives them the instructions.

清单 6.7 agentic_btree_coding_challenge.py
root = py_trees.composites.Sequence("RootSequence", memory=True)


thread = api.create_thread()    #1
challenge = textwrap.dedent("""
 #2
""")
judge_test_cases = textwrap.dedent("""
 #3
""")

hacker = create_assistant_action_on_thread(   
    线程=线程,     #4
    action_name="Hacker",
    assistant_name="Python 编码助手",
    助手说明=textwrap.dedent(f"""
    挑战目标:
    {挑战}     #5
    解决挑战并输出结果
最终解决方案保存在名为 solution.py 的文件中。        
    ""),
)
root.add_child(hacker)

judge = create_assistant_action_on_thread(    
    线程=线程,     #6
    action_name="评判解决方案"
    助理姓名="编程挑战赛评委",
    助手说明=textwrap.dedent(
        f"""
    挑战目标:
    {挑战}     #7
    从文件 solution.py 加载解决方案。
    然后确认这是否是解决该挑战的方案。
并使用以下测试用例进行测试:
    {judge_test_cases}     #8
    运行解决方案代码并确认其通过所有测试用例。
    如果解决方案通过所有测试,则将解决方案保存到名为“
judged_solution.py
    "",
    ),
)
root.add_child(judge)

# 验证程序在不同的线程上运行,本质上是在一个封闭的房间里运行。
验证器 = 创建助手条件(    #9
    condition_name="验证解决方案",
    assistant_name="Python 编码助手",
    助手说明=textwrap.dedent(
        f"""
    挑战目标:
    {挑战}     #10
    加载名为 judged_solution.py 的文件
运行代码并确认其通过,以此验证解决方案是否正确。
所有测试用例:
    {judge_test_cases}     #11
    如果答案正确,则只返回单词“SUCCESS”,否则返回空字符串。
返回单词 FAILURE。
    "",
    ),
)
root.add_child(verifier)

tree = py_trees.trees.BehaviourTree(root)


当 True 时:
    tree.tick()
    time.sleep(20)     #12
    如果 root.status == py_trees.common.Status.SUCCESS:   #13
        休息
### 所需助理 –
### Python 编程助手和编程挑战赛评测员
### 通过 Playground 安装这些助手
root = py_trees.composites.Sequence("RootSequence", memory=True)


thread = api.create_thread()     #1
challenge = textwrap.dedent("""
 #2
""")
judge_test_cases = textwrap.dedent("""
 #3
""")

hacker = create_assistant_action_on_thread(   
    thread=thread,      #4
    action_name="Hacker",
    assistant_name="Python Coding Assistant",
    assistant_instructions=textwrap.dedent(f"""
    Challenge goal: 
    {challenge}      #5
    Solve the challenge and output the 
final solution to a file called solution.py        
    """),
)
root.add_child(hacker)

judge = create_assistant_action_on_thread(    
    thread=thread,      #6
    action_name="Judge solution",
    assistant_name="Coding Challenge Judge",
    assistant_instructions=textwrap.dedent(
        f"""
    Challenge goal: 
    {challenge}      #7
    Load the solution from the file solution.py.
    Then confirm is a solution to the challenge 
and test it with the following test cases:
    {judge_test_cases}      #8
    Run the code for the solution and confirm it passes all the test cases.
    If the solution passes all tests save the solution to a file called 
judged_solution.py
    """,
    ),
)
root.add_child(judge)

# verifier operates on a different thread, essentially in closed room
verifier = create_assistant_condition(     #9
    condition_name="Verify solution",
    assistant_name="Python Coding Assistant",
    assistant_instructions=textwrap.dedent(
        f"""
    Challenge goal: 
    {challenge}      #10
    Load the file called judged_solution.py and 
verify that the solution is correct by running the code and confirm it passes 
all the test cases:
    {judge_test_cases}      #11
    If the solution is correct, return only the single word SUCCESS, otherwise 
return the single word FAILURE.
    """,
    ),
)
root.add_child(verifier)

tree = py_trees.trees.BehaviourTree(root)


while True:
    tree.tick()
    time.sleep(20)      #12
    if root.status == py_trees.common.Status.SUCCESS:    #13
        break
### Required assistants – 
### Python Coding Assistant and Coding Challenge Judge 
### install these assistants through the Playground
#1 创建一个由黑客和裁判共享的消息线程

#2 挑战内容如示例清单 6.5 所示

#3 测试内容如示例清单 6.6 所示

#4 创建一个由黑客和裁判共享的消息线程

#5 挑战内容如示例清单 6.5 所示

#6 创建一个由黑客和裁判共享的消息线程

#7 挑战内容如示例清单 6.5 所示

#8 测试内容如示例清单 6.6 所示

#9 调用创建一个新的消息线程

#10 挑战内容如示例清单 6.5 所示

#11 测试内容如示例清单 6.6 所示

#12 可以根据需要调整睡眠时间,并可用于限制发送到 LLM 的消息数量。

#13 该过程将持续进行,直到验证成功。

您可以通过在 VS Code 中加载文件或使用命令行来运行 ABT。跟踪终端中的输出,并观察助手如何完成树状结构中的每个步骤。

Run the ABT by loading the file in VS Code or using the command line. Follow the output in the terminal, and watch how the assistants work through each step in the tree.

如果解决方案在条件节点处验证失败,则流程将按照树状结构继续进行。即使是这种简单的解决方案,您也可以快速创建许多变体。您可以向树状结构中添加更多节点/步骤和子树。例如,您可能希望由一个黑客团队来分解和分析挑战。

If the solution fails to be verified at the condition node, the process will continue per the tree. Even with this simple solution, you could quickly create numerous variations. You could extend the tree with more nodes/steps and subtrees. Perhaps you want a team of Hackers to break down and analyze the challenge, for example.

本示例主要使用 Playground 代码及其辅助函数完成create_assistant_conditioncreate_assistant_action_on_thread该代码利用几个类来集成py_treesPlayground 中封装的行为树代码和 OpenAI Assistants 代码。如果您想了解底层细节,请查看项目中的代码。

This example’s work is done mainly with the Playground code, using the helper functions create_assistant_condition and create_assistant_action_on_thread. This code uses a couple of classes to integrate the py_trees behavior tree code and the OpenAI Assistants code wrapped in the Playground. Review the code within the project if you want to understand the lower-level details.

6.3.3对话式人工智能系统与其他方法的比较

6.3.3 Conversational AI systems vs. other methods

我们在第四章讨论 AutoGen 时已经接触过对话式多智能体系统。ABT 可以结合对话(通过线程)和其他方法(例如文件共享)来工作。让助手/智能体之间传递文件有助于减少嘈杂和重复的想法/对话。相比之下,对话系统则受益于潜在的涌现行为。因此,结合使用这两种方法有助于更好地控制系统并找到解决方案。

We already looked at conversational multi-agent systems in chapter 4 when we looked at AutoGen. The ABT can work using a combination of conversations (over threads) and other methods, such as file sharing. Having your assistants/agents pass files around helps reduce the number of noisy and repetitive thoughts/conversations. In contrast, conversational systems benefit from potential emergent behaviors. So, using both can help evolve better control and solutions.

清单 6.7 中的简单解决方案可以扩展以应对更多实际的编程挑战,甚至可以作为编程 ABT 使用。下一节中,我们将构建一个不同的 ABT 来处理不同的问题。

The simple solution in listing 6.7 could be extended to handle more real-world coding challenges and perhaps even to work as a coding ABT. In the next section, we build a different ABT to handle a different problem.

6.3.4将 YouTube 视频发布到 X

6.3.4 Posting YouTube videos to X

在本节的练习中,我们将研究一个能够执行以下操作的ABT:

In this section’s exercise, we look at an ABT that can do the following:

  1. 在 YouTube 上搜索特定主题的视频,并返回最新视频。
  2. Search for videos on YouTube for a given topic and return the latest videos.
  3. 下载搜索结果中所有视频的文字稿。
  4. Download the transcripts for all the videos your search provided.
  5. 总结文字记录。
  6. Summarize the transcripts.
  7. 查看摘要文字稿,并选择一个视频撰写一篇关于该视频的 X(以前称为 Twitter)帖子。
  8. Review the summarized transcripts and select a video to write an X (formerly Twitter) post about.
  9. 写一篇关于该视频的精彩且引人入胜的帖子,确保字数不超过 280 个字符。
  10. Write an exciting and engaging post about the video, ensuring it’s less than 280 characters.
  11. 请审阅帖子,然后将其发布到 X。
  12. Review the post and then post it on X.

图 6.9 展示了使用不同助手组装的 ABT。在本练习中,我们使用序列节点作为根节点,每个助手执行不同的操作。此外,为了简化操作,每个助手的交互都将在一个新的线程中进行。这使得每个助手的交互都形成一个简洁的对话,如果出现问题,则更容易进行调试。

Figure 6.9 shows the ABT assembled with each of the different assistants. In this exercise, we use a sequence node for the root, and each assistant performs a different action. Also, to keep things simple, each assistant interaction will always occur in a new thread. This isolates each assistant’s interaction into a concise conversation that’s easier to debug if something goes wrong.

数字
图 6.9 YouTube 社交媒体 ABT

6.3.5需要 X 设置

6.3.5 Required X setup

如果您计划运行本练习中的代码,则必须将您的 X 凭据添加到.env文件中。该.env.default文件提供了一个凭据示例,如清单 6.8 所示。您无需输入您的凭据。这意味着最后一步(提交)将会失败,但您仍然可以查看该文件(youtube_twitter_post.txt)以了解生成的内容。

If you plan to run the code in this exercise, you must add your X credentials to the .env file. The .env.default file shows an example of how the credentials need to be, as shown in listing 6.8. You don’t have to enter your credentials. This means the last step, posting, will fail, but you can still look at the file (youtube_twitter_post.txt) to see what was generated.

清单 6.8配置凭据
X_EMAIL = "此处填写推特邮箱地址"
X_USERNAME = "此处填写推特用户名"
X_PASSWORD = "此处填写推特密码"
X_EMAIL = "twitter email here"
X_USERNAME = "twitter username here"
X_PASSWORD = "twitter password here"

清单 6.9 仅展示了创建助手操作的代码。此 ABT 使用了三个不同的助手,每个助手都有自己的任务指令。请注意,每个助手都有一组独特的指令来定义其角色。您可以使用 Playground 查看每个助手的指令。

Listing 6.9 shows just the code for creating the assistant actions. This ABT uses three different assistants, each with its own task instructions. Note that each assistant has a unique set of instructions defining its role. You can review the instructions for each assistant by using the Playground.

列表 6.9 agentic_btree_video_poster_v1.py
root = py_trees.composites.Sequence("RootSequence", memory=True)

搜索词 = "GPT 代理"
search_youtube_action = create_assistant_action(
    action_name=f"搜索 YouTube({search_term})",
    助理名称="YouTube 研究员 v2",
    助手说明=f"""
    搜索词:{search_term}
    使用查询“{search_term}”在 YouTube 上搜索视频。
    然后下载每个视频的文字稿并进行总结。
与 {search_term} 的相关性
    请务必附上每个视频的链接。
    然后将所有摘要保存到名为 youtube_transcripts.txt 的文件中。
    如果遇到任何错误,请只返回单词 FAILURE。
    "",
)
root.add_child(search_youtube_action)

write_post_action = create_assistant_action(
    action_name="写帖子",
    助理姓名="Twitter 帖子撰写者",
    助手说明=""
    加载名为 youtube_transcripts.txt 的文件,
    分析内容,查找顶部包含搜索词的内容,并
然后选择
    最精彩、最相关的视频:
    在推特上发布具有教育意义、娱乐性或信息性的内容。
    然后写一条与视频相关的推文,
    并附上视频链接。
    包含精彩亮点或提及,
    并将其保存到名为 youtube_twitter_post.txt 的文件中。
    如果遇到任何错误,请只返回单词 FAILURE。
    "",
)
root.add_child(write_post_action)

post_action = create_assistant_action(
    action_name="Post",
    助理名称="社交媒体助理",
    助手说明=""
    加载名为 youtube_twitter_post.txt 的文件并发布内容
发到推特上。
    如果内容为空,请不要发布任何内容。
    如果遇到任何错误,请只返回单词 FAILURE。
    "",
)
root.add_child(post_action)
### 所需助理 – YouTube 研究员 v2、Twitter 帖子撰写员、
以及社交媒体助手——通过 Playground 安装这些助手。
root = py_trees.composites.Sequence("RootSequence", memory=True)

search_term = "GPT Agents"
search_youtube_action = create_assistant_action(
    action_name=f"Search YouTube({search_term})",
    assistant_name="YouTube Researcher v2",
    assistant_instructions=f"""
    Search Term: {search_term}
    Use the query "{search_term}" to search for videos on YouTube.
    then for each video download the transcript and summarize it 
for relevance to {search_term}
    be sure to include a link to each of the videos,
    and then save all summarizations to a file called youtube_transcripts.txt
    If you encounter any errors, please return just the word FAILURE.
    """,
)
root.add_child(search_youtube_action)

write_post_action = create_assistant_action(
    action_name="Write Post",
    assistant_name="Twitter Post Writer",
    assistant_instructions="""
    Load the file called youtube_transcripts.txt,
    analyze the contents for references to search term at the top and 
then select
    the most exciting and relevant video related to: 
    educational, entertaining, or informative, to post on Twitter.
    Then write a Twitter post that is relevant to the video,
    and include a link to the video, along
    with exciting highlights or mentions, 
    and save it to a file called youtube_twitter_post.txt.
    If you encounter any errors, please return just the word FAILURE.
    """,
)
root.add_child(write_post_action)

post_action = create_assistant_action(
    action_name="Post",
    assistant_name="Social Media Assistant",
    assistant_instructions="""
    Load the file called youtube_twitter_post.txt and post the content 
to Twitter.
    If the content is empty please do not post anything.
    If you encounter any errors, please return just the word FAILURE.
    """,
)
root.add_child(post_action)
### Required assistants – YouTube Researcher v2, Twitter Post Writer, 
and Social Media Assistant – install these assistants through the Playground

像往常一样运行代码,几分钟后,assistants_output文件夹中将出现一篇新帖子。图 6.10 显示了使用此 ABT 生成的帖子示例。如果每天运行此 ABT 生成的帖子过多,您的 X 帐户很可能会被封禁。如果您已配置 X 凭据,则会在您的动态中看到该帖子。

Run the code as you normally would, and after a few minutes, a new post will appear in the assistants_output folder. Figure 6.10 shows an example of a post generated using this ABT. Running this ABT to generate more than a few posts a day could, and likely will, get your X account blocked. If you’ve configured X credentials, you’ll see the post appear on your feed.

数字
图 6.10来自 ABT 的示例 X 帖子

此ABT仅用于演示目的,不适用于生产环境或长期使用。此演示的主要功能包括:展示搜索和数据加载、摘要和筛选、生成新内容,以及重点介绍多个自定义操作和与API的集成。

This ABT is shown for demonstration purposes and isn’t for production or long-term use. The primary features of this demonstration are to show search and loading data, summarization and filtering, then generating new content, and finally highlighting multiple custom actions and integrations with APIs.

6.4构建对话式自主多智能体

6.4 Building conversational autonomous multi-agents

多智能体系统的对话特性可以驱动反馈、推理和涌现行为等机制。正如我们在 YouTube 发布示例中所看到的,使用将助手/智能体隔离的 ABT 来驱动智能体,可以有效地控制结构化流程。然而,我们也不希望错过智能体/助手之间对话带来的益处。

The conversational aspect of multi-agent systems can drive mechanisms such as feedback, reasoning, and emergent behaviors. Driving agents with ABTs that silo assistants/agents can be effective for controlling structured processes, as we saw in the YouTube posting example. However, we also don’t want to miss out on the benefits of conversation across agents/assistants.

幸运的是,Playground 提供了将助手隔离或合并到对话线程中的方法。图 6.11 展示了如何将助手以各种组合方式隔离或合并到线程中。将隔离模式与对话模式相结合,可以兼顾两种模式的优点。

Fortunately, the Playground provides methods to silo or join assistants to conversation threads. Figure 6.11 shows how assistants can be siloed or mixed in various combinations to threads. Combining silos with conversation provides the best of both patterns.

数字
图 6.11孤立式和对话式助手的各种布局

我们将通过一个简单而实用的练习来演示对话模式的有效性。在接下来的练习中,我们将使用两个助手在一个 ABT 中通过同一个线程进行对话。下面的代码清单展示了包含各个助手的树状结构。

We’ll examine a simple but practical exercise to demonstrate the effectiveness of the conversational pattern. For the next exercise, we’ll employ two assistants in an ABT that converse over the same thread. The next listing shows the tree’s construction in code with the respective assistants.

清单 6.10 agentic_conversation_btree.py
root = py_trees.composites.Sequence("RootSequence", memory=True)
bug_file = """
# 代码未显示
"""

thread = api.create_thread()    #1

debug_code = create_assistant_action_on_thread(    #2
    线程=线程,
    action_name="调试代码",
    assistant_name="Python 调试器",
    助手说明=textwrap.dedent(f"""    
    以下是存在错误的代码:
    {bug_file}
    运行代码以识别错误并修复它们。
    务必测试代码,确保其运行无错误或抛出异常。
任何例外情况。
    ""),
)
root.add_child(debug_code)

验证 = create_assistant_condition_on_thread(    #3
    线程=线程,
    condition_name="验证",
    assistant_name="Python 编码助手",
    助手说明=textwrap.dedent(
        """
    验证该解决方案是否修复了错误,并且没有其他问题。
    验证代码运行时是否抛出任何异常。
    如果解决方案正确,则回复“成功”,否则回复“失败”。
    如果您对解决方案满意,请将代码保存到名为“
fixed_bug.py。
    "",
    ),
)
root.add_child(verify)
tree = py_trees.trees.BehaviourTree(root)
当 True 时:
    tree.tick()    
    如果 root.status == py_trees.common.Status.SUCCESS:
        休息   #4
    time.sleep(20)
root = py_trees.composites.Sequence("RootSequence", memory=True)
bug_file = """
# code not shown
"""

thread = api.create_thread()     #1

debug_code = create_assistant_action_on_thread(     #2
    thread=thread,
    action_name="Debug code",
    assistant_name="Python Debugger",
    assistant_instructions=textwrap.dedent(f"""    
    Here is the code with bugs in it:
    {bug_file}
    Run the code to identify the bugs and fix them. 
    Be sure to test the code to ensure it runs without errors or throws 
any exceptions.
    """),
)
root.add_child(debug_code)

verify = create_assistant_condition_on_thread(     #3
    thread=thread,
    condition_name="Verify",
    assistant_name="Python Coding Assistant",
    assistant_instructions=textwrap.dedent(
        """
    Verify the solution fixes the bug and there are no more issues.
    Verify that no exceptions are thrown when the code is run.
    Reply with SUCCESS if the solution is correct, otherwise return FAILURE.
    If you are happy with the solution, save the code to a file called 
fixed_bug.py.
    """,
    ),
)
root.add_child(verify)
tree = py_trees.trees.BehaviourTree(root)
while True:
    tree.tick()    
    if root.status == py_trees.common.Status.SUCCESS:
        break    #4
    time.sleep(20)
#1 创建一个消息线程,供助手们共享和交流

#2 使用特殊助手创建调试代码操作

#3 创建验证条件,以测试代码是否已修复

#4 树将继续运行,直到根序列成功完成。

这棵树由三个节点组成:根序列、调试代码操作和验证修复条件。由于树的根节点是一个序列,两个助手会依次执行,直到都成功返回结果。两个助手在同一线程上进行交互,但其控制方式能够提供持续的反馈。

Three nodes comprise the tree: the root sequence, the debug code action, and the verify fix condition. Because the tree’s root is a sequence, the two assistants will continue to work one after another until they both return with success. Both assistants converse on the same thread and yet are controlled in a manner that provides constant feedback.

您可以通过在 VS Code 中加载文件来运行练习,也可以直接从命令行执行。示例代码存在一些小错误和问题,助手会逐一修复。ABT 成功运行后,您可以打开文件assistants_output/fixed_bug.py并验证结果是否正确。

Run the exercise by loading the file in VS Code, or execute it directly from the command line. The example code has a few minor bugs and problems that the assistants will work through to fix. After the ABT completes running successfully, you can open the assistants_output/fixed_bug.py file and verify the results are all good.

我们现在已经了解了一些实际应用的ABT,也理解了使用信息孤岛或对话的细微差别。接下来的部分将教你一些构建自己的ABT的技巧。

We’ve now seen a couple of ABTs in action and understand the nuances of using silos or conversations. The following section will teach you some techniques for building your own ABTs.

6.5构建带有反向链接的ABT

6.5 Building ABTs with back chaining

反向链接是一种基于逻辑推理的方法,它通过从目标出发反向推导来构建行为树。本节将使用反向链接过程来构建一个能够实现目标的自适应行为树(ABT)。以下列表更详细地描述了该过程:

Back chaining is a method derived from logic and reasoning used to help build behavior trees by working backward from the goal. This section will use the back chaining process to construct an ABT that works to achieve the goal. The following list provides a description of the process in more detail:

  1. 确定目标行为。首先确定你希望智能体执行的行为。
  2. Identify goal behavior. Start with the behavior you want the agent to perform.
  3. 确定所需采取的行动。找出能够促成目标行为的行动。
  4. Determine the required actions. Identify the actions that lead to the goal behavior.
  5. 明确条件。确定每项行动成功必须满足的条件。
  6. Identify the conditions. Determine the conditions that must be met for each action to succeed.
  7. 确定沟通方式。确定助理们将如何传递信息。助理们是各自独立沟通,还是通过聊天线程交流,或者结合多种模式更好?
  8. Determine the mode of communication. Determine how the assistants will pass on information. Will the assistants be siloed or converse over threads, or is a combination of patterns better?
  9. 构建行为树。首先从目标行为开始构建行为树,递归地添加动作和条件节点,直到所有必要的条件都与已知的状态或事实关联起来。
  10. Construct the tree. Start by building the behavior tree from the goal behavior, adding nodes for actions and conditions recursively until all necessary conditions are linked to known states or facts.

行为树通常使用一种称为黑板的模式来实现节点间的通信。黑板(例如在[此处应插入代码示例]中py_trees)使用键值存储来保存信息,并使其能够在节点间访问。它还提供了多种控制功能,例如限制对特定节点的访问。

Behavior trees typically use a pattern called the blackboard to communicate across nodes. Blackboards, like those in py_trees, use a key/value store to save information and make it accessible across nodes. It also provides for several controls, such as limiting access to specific nodes.

我们之所以选择使用文件进行沟通,是因为文件简单透明。但未来智能体系统需要处理的信息量将远超黑板所设计的信息格式,而且格式也会更加多样化。因此,黑板要么需要变得更加复杂,要么需要与文件存储解决方案集成。

We deferred to using files for communication because of their simplicity and transparency. At some point, agentic systems are expected to consume much more information and in different formats than those designed for blackboards. Blackboards must either become more sophisticated or be integrated with file storage solutions.

让我们使用反向链接构建一个ABT。我们可以实现各种目标,但一个有趣且或许更具元意义的目标是构建一个能够帮助构建助手的ABT。因此,我们首先将目标表述为一个语句:“创建一个可以帮助我完成{任务}的助手”:

Let’s build an ABT using back chaining. We could tackle a variety of goals, but one interesting and perhaps meta goal is to build an ABT that helps build assistants. So let’s first present our goal as a statement “Create an assistant that can help me do {task}”:

  • 所需操作:(从后往前推)
    • 创建助手。
    • 核实助理身份。
    • 测试助手。
    • 请说出助理的名字。
    • 给助理下达相关指示。
  • Required actions: (working backwards)
    • Create an assistant.
    • Verify the assistant.
    • Test the assistant.
    • Name the assistant.
    • Give the assistant the relevant instructions.
  • 已确诊的病症:
    • 核实助理身份。
  • Identified condition:
    • Verify the assistant.
  • 确定沟通模式:为了增加趣味性,我们将所有助手放在同一个消息线程中运行。
  • Determine communication patterns: To keep things interesting, we’ll run all assistants on the same message thread.
  • 构建树:为了构建树,我们首先反转操作顺序,并相应地标记每个元素的操作和条件:
    • (操作)向助手提供相关指示,以帮助用户完成给定任务。
    • (操作)说出助理的名字。
    • (操作)测试助手。
    • (条件)验证助手。
    • (操作)创建助手。
  • Construct the tree: To construct the tree, let’s first reverse the order of actions and mark each of the element’s actions and conditions accordingly:
    • (action) Give the assistant relevant instructions to help a user with a given task.
    • (action) Name the assistant.
    • (action) Test the assistant.
    • (condition) Verify the assistant.
    • (action) Create the assistant.

当然,现在构建这棵树最简单的办法是求助于 ChatGPT 或其他类似的模型。ChatGPT 生成树的结果如下表所示。你也可以独立构建这棵树,并可能引入其他元素。

Of course, the simple solution to building the tree now is to ask ChatGPT or an otherwise capable model. The result of asking ChatGPT to make the tree is shown in the next listing. You could also work the tree out independently and perhaps introduce other elements.

清单 6.11 ABT 用于构建助手
根
│
├── 序列
│ ├── 操作:向助手提供相关指令以帮助用户
完成给定任务
│ ├── 操作:为助手命名
│ ├── 操作:测试助手
│ ├── 条件:验证助手
│ └── 操作:创建助手
Root
│
├── Sequence
│    ├── Action: Give the assistant relevant instructions to help a user 
with a given task
│    ├── Action: Name the assistant
│    ├── Action: Test the assistant
│    ├── Condition: Verify the assistant
│    └── Action: Create the assistant

从这里开始,我们可以遍历每个动作和条件节点,构建决策树,并确定助手需要哪些指令。这其中也包括任何工具和自定义动作,包括你可能需要开发的动作。在第一次遍历时,指令应保持通用性。理想情况下,我们希望创建的助手数量尽可能少。

From this point, we can start building the tree by iterating over each action and condition node and determining what instructions the assistant needs. This can also include any tools and custom actions, including ones you may need to develop. On your first pass, keep the instructions generic. Ideally, we want to create as few assistants as necessary.

确定了每个助手及其工具和操作,以及它们各自负责的任务之后,您可以尝试进一步进行概括。思考哪些操作可以合并,从而减少助手的数量。最好从助手数量不足的情况开始评估,而不是一开始就使用过多的助手。但是,务必确保任务划分合理:例如,测试和验证最好由不同的助手完成。

After determining the assistant, tools, and actions for each assistant and for which task, you can try to generalize things further. Think about where it may be possible to combine actions and reduce the number of assistants. It’s better to start evaluating with insufficient assistants than with too many. However, be sure to maintain the proper divisions of work as tasks: for example, testing and verification are best done with different assistants.

6.6练习

6.6 Exercises

完成以下练习以加深你对相关内容的理解:

Complete the following exercises to improve your knowledge of the material:

  • 练习 1 — 创建旅行计划表 ABT
  • Exercise 1—Creating a Travel Planner ABT

目标 ——构建一个智能体行为树(ABT),利用助手来规划旅行行程。

Objective —Build an agentic behavior tree (ABT) to plan a travel itinerary using assistants.

任务

Tasks:

    • 在本地计算机上安装 GPT Assistants Playground。
    • 创建一个ABT(应用行为树)来规划旅行行程。该树状图应具有以下结构:
      • 操作:使用旅行助手收集有关潜在目的地的信息。
      • 操作:使用行程规划器创建每日旅行计划。
      • 条件:使用其他旅行助手验证行程的完整性和可行性。
    • 实施并运行 ABT 以创建完整的旅行行程。
    • Set up the GPT Assistants Playground on your local machine.
    • Create an ABT to plan a travel itinerary. The tree should have the following structure:
      • Action: Use the Travel assistant to gather information about potential destinations.
      • Action: Use the Itinerary Planner to create a day-by-day travel plan.
      • Condition: Verify the completeness and feasibility of the itinerary using another Travel Assistant.
    • Implement and run the ABT to create a complete travel itinerary.
  • 练习 2 — 构建客户支持自动化的 ABT
  • Exercise 2—Building an ABT for Customer Support Automation

目标 ——创建一个使用助手自动执行客户支持回复的 ABT。

Objective —Create an ABT that automates customer support responses using assistants.

任务

Tasks:

    • 在本地计算机上安装 GPT Assistants Playground。
    • 创建一个具有以下结构的ABT:
      • 操作:使用客户查询分析器助手对客户查询进行分类。
      • 操作:使用回复生成器助手,根据查询类别起草回复。
      • 操作:使用客户支持助手向客户发送回复。
    • 实施并运行 ABT,以实现分析和响应客户查询流程的自动化。
    • Set up the GPT Assistants Playground on your local machine.
    • Create an ABT with the following structure:
      • Action: Use the Customer Query Analyzer assistant to categorize customer queries.
      • Action: Use the Response Generator assistant to draft responses based on the query categories.
      • Action: Use the Customer Support assistant to send the responses to customers.
    • Implement and run the ABT to automate the process of analyzing and responding to customer queries.
  • 练习 3 — 使用 ABT 管理库存
  • Exercise 3—Managing Inventory with an ABT

目标 ——学习如何使用ABT创建和管理库存水平。

Objective —Learn how to create and manage inventory levels using an ABT.

任务

Tasks:

    • 在本地计算机上安装 GPT Assistants Playground。
    • 创建一个用于管理零售企业库存的ABT:
      • 操作:使用库存检查助手查看当前库存水平。
      • 操作:使用订购助手订购库存不足的商品。
      • 条件:核实订单是否已正确下达,并更新库存记录。
    • 实施并运行ABT以动态管理库存。
    • Set up the GPT Assistants Playground on your local machine.
    • Create an ABT that manages inventory for a retail business:
      • Action: Use the Inventory Checker assistant to review current stock levels.
      • Action: Use the Order assistant to place orders for low-stock items.
      • Condition: Verify that orders have been placed correctly and update inventory records.
    • Implement and run the ABT to manage inventory dynamically.
  • 练习 4 — 创建私人健身教练 ABT
  • Exercise 4—Creating a Personal Fitness Trainer ABT

目标 ——创建一个使用助手提供个性化健身训练计划的ABT。

Objective —Create an ABT that provides personalized fitness training plans using assistants.

任务

Tasks:

    • 在本地计算机上安装 GPT Assistants Playground。
    • 创建ABT以制定个性化健身计划:
      • 操作:使用体能评估助手评估用户当前的体能水平。
      • 行动:使用训练计划生成器,根据评估结果创建个性化的健身计划。
      • 条件:请另一位健身助理确认该计划的适用性和安全性。
    • 实施并运行 ABT,以生成和验证个性化健身训练计划。
    • Set up the GPT Assistants Playground on your local machine.
    • Create an ABT to develop a personalized fitness plan:
      • Action: Use the Fitness Assessment assistant to evaluate the user’s current fitness level.
      • Action: Use the Training Plan Generator to create a custom fitness plan based on the assessment.
      • Condition: Verify the plan’s suitability and safety using another Fitness assistant.
    • Implement and run the ABT to generate and validate a personalized fitness training plan.
  • 练习 5 — 使用反向链接构建财务顾问 ABT
  • Exercise 5—Using Back Chaining to Build a Financial Advisor ABT

目标 ——应用反向链接法构建一个提供财务建议和投资策略的ABT。

Objective —Apply back chaining to construct an ABT that provides financial advice and investment strategies.

任务

Tasks:

    • 在本地计算机上安装 GPT Assistants Playground。
    • 定义以下目标:“创建一个能够提供财务建议和投资策略的助手。”
    • 利用逆向推理,确定实现此目标所需的行动和条件。
    • 通过反向构建树的基本行动和条件,实施并运行 ABT,以生成全面的财务咨询服务。
    • Set up the GPT Assistants Playground on your local machine.
    • Define the following goal: “Create an assistant that can provide financial advice and investment strategies.”
    • Using back chaining, determine the actions and conditions needed to achieve this goal.
    • Implement and run the ABT to generate a comprehensive financial advisory service by back chaining the construction of the base actions and conditions for the tree.

概括

Summary

  • 行为树是一种强大且可扩展的人工智能控制模式,最初由罗德尼·A·布鲁克斯引入机器人领域。由于其模块化和可重用性,行为树被广泛应用于游戏和机器人领域。
  • Behavior trees are a robust and scalable AI control pattern, first introduced in robotics by Rodney A. Brooks. They are widely used in gaming and robotics for their modularity and reusability.
  • 行为树的主要节点包括选择器、序列、条件、动作、装饰器和并行节点。选择器类似于“或”逻辑块:序列按顺序执行节点,条件测试状态,动作执行实际操作,装饰器是包装器,而并行节点允许双重执行。
  • The primary nodes in behavior trees are the selector, sequence, condition, action, decorator, and parallel nodes. Selectors are like “or” blocks: sequence executes nodes in sequence, condition tests the state, action does the work, decorator is a wrapper, and parallel nodes allow for dual execution.
  • 了解行为树的执行流程对于设计、构建和操作行为树至关重要,以便为制定清晰的决策路径提供控制。
  • Understanding the execution flow of behavior trees can be critical to designing, building, and operating them to provide control for making clear decision-making paths.
  • 行为树的优点包括模块化、可扩展性、灵活性、易于调试以及决策逻辑的解耦,使得行为树适用于复杂的人工智能系统。
  • The advantages of behavior trees include modularity, scalability, flexibility, debugging ease, and decoupling of decision logic, making behavior trees suitable for complex AI systems.
  • 在 Python 中设置和运行简单的行为树需要正确命名和记录自定义节点。
  • Setting up and running a simple behavior tree in Python requires correctly naming and documenting custom nodes.
  • GPT Assistants Playground 项目是一个基于 Gradio 的界面,它模仿了 OpenAI Assistants Playground,并添加了用于教授和演示 ABT 的附加功能。
  • The GPT Assistants Playground project is a Gradio-based interface that mimics the OpenAI Assistants Playground with additional features for teaching and demonstrating ABTs.
  • GPT Assistants Playground 允许创建和管理自定义操作,这对于构建功能全面的助手至关重要。
  • The GPT Assistants Playground allows for creating and managing custom actions, which is essential for building versatile assistants.
  • ABT(智能体控制技术)通过提示来控制代理和助手,从而指导助手执行操作并设定条件。ABT 利用 LLM(逻辑逻辑模型)的强大功能来创建动态自主系统。
  • ABTs control agents and assistants by using prompts to direct actions and conditions for assistants. ABTs use the power of LLMs to create dynamic and autonomous systems.
  • 反向链接法是一种从目标行为出发,反向构建行为树的方法。该方法包括识别所需的动作、条件和通信模式,然后逐步构建行为树。
  • Back chaining is a method for constructing behavior trees by working backward from the goal behavior. This process involves identifying required actions, conditions, and communication patterns, and then constructing the tree step by step.
  • 智能体系统受益于实体间通信的独立式和对话式模式。ABT 可以通过结合独立式和对话式助手来利用结构化流程和涌现行为,从而获益。
  • Agentic systems benefit from siloed and conversation patterns for communicating between entities. ABTs can benefit from combining siloed and conversational assistants to use structured processes and emergent behaviors.

7. 组装和使用代理平台

7 Assembling and using an agent platform

本章内容

This chapter covers

  • Nexus 聊天和仪表盘界面,适用于 AI 代理
  • Nexus chat and dashboard interface for AI agents
  • Streamlit 框架,用于构建智能仪表盘、原型和 AI 聊天应用
  • Streamlit framework for building intelligent dashboards, prototypes, and AI chat apps
  • 在 Nexus 中开发、测试和使用代理配置文件和角色
  • Developing, testing, and engaging agent profiles and personas in Nexus
  • 开发基础 Nexus 代理
  • Developing the base Nexus agent
  • 单独或在 Nexus 中开发、测试和启用代理操作和工具
  • Developing, testing, and engaging agent actions and tools alone or within Nexus

在探索了一些关于智能体的基本概念,并研究了如何使用语义内核(SK)等框架,通过工具操作来构建提示和角色之后,我们迈出了构建本书基础的第一步。这个基础就是Nexus,一个旨在易于学习、易于探索且功能强大的智能体平台,足以帮助您构建自己的智能体系统。

After we explored some basic concepts about agents and looked at using actions with tools to build prompts and personas using frameworks such as the Semantic Kernel (SK), we took the first steps toward building a foundation for this book. That foundation is called Nexus, an agent platform designed to be simple to learn, easy to explore, and powerful enough to build your agent systems.

7.1隆重推出 Nexus,它不仅仅是另一个代理平台

7.1 Introducing Nexus, not just another agent platform

目前市面上已有超过100个用于开发和使用大型语言模型(LLM)应用的AI平台和工具包,从SK或LangChain等工具包到AutoGen和CrewAI等完整平台,应有尽有。这使得选择最适合构建自身AI代理的平台变得颇为困难。

There are more than 100 AI platforms and toolkits for consuming and developing large language model (LLM) applications, ranging from toolkits such as SK or LangChain to complete platforms such as AutoGen and CrewAI. This makes it difficult to decide which platform is well suited to building your own AI agents.

Nexus 是一个开源平台,与本书配套开发,旨在教授构建功能齐全的 AI 代理的核心概念。本章将探讨 Nexus 的构建方式,并介绍代理的两个主要组成部分:用户画像/角色和动作/工具。

Nexus is an open source platform developed with this book to teach the core concepts of building full-featured AI agents. In this chapter, we’ll examine how Nexus is built and introduce two primary agent components: profiles/personas and actions/tools.

图 7.1 展示了 Nexus 的主界面,Nexus 是一款 Streamlit 聊天应用程序,允许用户选择和探索各种代理功能。该界面与 ChatGPT、Gemini 和其他商业 LLM 应用程序类似。

Figure 7.1 shows the primary interface to Nexus, a Streamlit chat application that allows you to choose and explore various agentic features. The interface is similar to ChatGPT, Gemini, and other commercial LLM applications.

数字
图 7.1 Nexus 界面和功能

除了LLM聊天应用程序的标准功能外,Nexus还允许用户配置代理以使用特定的API/模型、角色以及可执行的操作。本书的其余部分将介绍以下可用的代理选项:

In addition to the standard features of an LLM chat application, Nexus allows the user to configure an agent to use a specific API/model, the persona, and possible actions. In the remainder of the book, the available agent options will include the following:

  • 用户画像/角色简介——客服人员将使用的主要用户画像和角色简介。用户画像代表用户的个性和主要动机,客服人员会根据该用户画像来回答用户请求。本章我们将探讨如何创建和使用用户画像/角色简介。
  • Personas/profiles—The primary persona and profile the agent will use. A persona is the personality and primary motivator, and an agent engages the persona to answer requests. We’ll look in this chapter at how personas/profiles can be developed and consumed.
  • 动作/工具——表示代理可以使用工具执行的动作,无论是语义/提示函数还是原生/代码函数。本章将介绍如何在 Nexus 中构建语义函数和原生函数。
  • Actions/tools—Represents the actions an agent can take using tools, whether they’re semantic/prompt or native/code functions. In this chapter, we’ll look at how to build both semantic and native functions within Nexus.
  • 知识/记忆——代表智能体可能获取的额外信息。同时,智能体记忆可以代表各种方面,从短期记忆到语义记忆。
  • Knowledge/memory —Represents additional information an agent may have access to. At the same time, agent memory can represent various aspects, from short-term to semantic memory.
  • 计划/反馈 —表示代理如何制定计划以及如何接收关于计划或计划执行情况的反馈。Nexus 允许用户选择代理使用的计划和反馈类型。
  • Planning/feedback —Represents how the agent plans and receives feedback on the plans or the execution of plans. Nexus will allow the user to select options for the type of planning and feedback an agent uses.

随着本书的深入,我们将逐步添加 Nexus 以支持新的代理功能。但与此同时,我们仍将力求保持内容的简洁易懂,以便教授许多关键的核心概念。下一节,我们将先介绍如何快速上手 Nexus,然后再深入探讨其各项功能。

As we progress through this book, Nexus will be added to support new agent features. However, simultaneously, the intent will be to keep things relatively simple to teach many of these essential core concepts. In the next section, we’ll look at how to quickly use Nexus before going under the hood to explore features in detail.

7.1.1运行 Nexus

7.1.1 Running Nexus

Nexus 主要面向各级别开发者,是一个教学平台。因此,它支持多种部署和使用方式。在接下来的练习中,我们将介绍如何快速上手 Nexus。

Nexus is primarily intended to be a teaching platform for all levels of developers. As such, it will support various deployment and usage options. In the next exercise, we’ll introduce how to get up and running with Nexus quickly.

打开终端并创建一个新的 Python 虚拟环境(版本 3.10)。如果您需要创建虚拟环境的帮助,请参阅附录 B。然后,在这个新环境中执行清单 7.1 中所示的命令。您可以在命令行中设置环境变量,也可以创建一个新.env文件并添加设置。

Open a terminal to a new Python virtual environment (version 3.10). If you need assistance creating one, refer to appendix B. Then, execute the commands shown in listing 7.1 within this new environment. You can either set the environment variable at the command line or create a new .env file and add the setting.

清单 7.1终端命令行
pip install git+https://github.com/cxbxmxcx/Nexus.git    #1

设置您的 OpenAI API 密钥
export OPENAI_API_KEY=”<您的API密钥>”         #2
或者
$env: OPENAI_API_KEY = =”<您的 API 密钥>” #2
或者
echo 'OPENAI_API_KEY="<您的 API 密钥>"' > .env #2

枢纽跑     #3
pip install git+https://github.com/cxbxmxcx/Nexus.git     #1

#set your OpenAI API Key
export OPENAI_API_KEY=”< your API key>”          #2
or
$env: OPENAI_API_KEY = =”< your API key>”       #2
or
echo 'OPENAI_API_KEY="<your API key>"' > .env   #2

nexus run      #3
#1 直接从仓库和分支安装软件包;请确保包含分支。

#2 将密钥创建为环境变量,或创建一个包含该设置的新 .env 文件。

#3 运行应用程序。

输入最后一个命令后,将启动一个网站,并显示登录页面,如图 7.2 所示。请创建一个新用户。未来版本的 Nexus 将允许多个用户参与聊天。

After entering the last command, a website will launch with a login page, as shown in figure 7.2. Go ahead and create a new user. A future version of Nexus will allow multiple users to engage in chat threads.

数字
图 7.2登录或创建新的 Nexus 用户

登录后,您将看到如图 7.1 所示的页面。创建新的聊天并开始与客服人员沟通。如果遇到问题,请确保已正确设置 API 密钥。如下一节所述,您可以使用此方法或从开发工作流程运行 Nexus。

After you log in, you’ll see a page like figure 7.1. Create a new chat and start conversing with an agent. If you encounter a problem, be sure you have the API key set properly. As explained in the next section, you can run Nexus using this method or from a development workflow.

7.1.2发展纽带

7.1.2 Developing Nexus

在学习本书练习的过程中,你需要将 Nexus 设置为开发模式。这意味着直接从 GitHub 下载代码库并使用其中的代码。

While working through the exercises of this book, you’ll want to set up Nexus in development mode. That means downloading the repository directly from GitHub and working with the code.

打开一个新的终端窗口,并将工作目录设置为chapter_7源代码文件夹。然后,设置一个新的 Python 虚拟环境(版本 3.10),并输入清单 7.2 中所示的命令。如果您在之前的设置过程中需要任何帮助,请参阅附录 B。

Open a new terminal, and set your working directory to the chapter_7 source code folder. Then, set up a new Python virtual environment (version 3.10) and enter the commands shown in listing 7.2. Again, refer to appendix B if you need assistance with any previous setup.

清单 7.2安装用于开发的 Nexus
git clone https://github.com/cxbxmxcx/Nexus.git     #1

pip install -e Nexus    #2

#设置您的 OpenAI API 密钥(建议使用 .env 文件)
export OPENAI_API_KEY=”<您的API密钥>” #bash           #3
或者
$env: OPENAI_API_KEY = =”<您的 API 密钥>” #powershell #3
或者
echo 'OPENAI_API_KEY="<您的 API 密钥>"' > .env #3     


枢纽跑     #4
git clone https://github.com/cxbxmxcx/Nexus.git      #1

pip install -e Nexus     #2

#set your OpenAI API Key (.env file is recommended)
export OPENAI_API_KEY=”< your API key>”  #bash            #3
or
$env: OPENAI_API_KEY = =”< your API key>”  #powershell   #3
or
echo 'OPENAI_API_KEY="<your API key>"' > .env       #3     


nexus run      #4
#1 从代码库下载并安装特定分支

#2 将下载的代码库安装为可编辑软件包

#3 将您的 OpenAI 密钥设置为环境变量或添加到 .env 文件

#4 启动应用程序

图 7.3 显示了“登录”或“创建新用户”屏幕。创建新用户后,应用程序将自动登录。本应用程序使用 Cookie 来记住用户,因此您下次启动应用程序时无需重新登录。如果您的浏览器禁用了 Cookie,则每次都需要登录。

Figure 7.3 shows the Login or Create New User screen. Create a new user, and the application will log you in. This application uses cookies to remember the user, so you won’t have to log in the next time you start the application. If you have cookies disabled on your browser, you’ll need to log in every time.

数字
图 7.3登录或创建新用户页面

前往 Nexus 代码库文件夹并浏览一番。图 7.4 展示了应用程序主要元素的架构图。在最顶层,使用 Streamlit 开发的界面通过聊天系统连接系统的其他部分。聊天系统管理着数据库、代理管理器、操作管理器和个人资料管理器。

Go to the Nexus repository folder and look around. Figure 7.4 shows an architecture diagram of the application’s main elements. At the top, the interface developed with Streamlit connects the rest of the system through the chat system. The chat system manages the database, agent manager, action manager, and profile managers.

数字
图 7.4应用程序主要元素的高级架构图

这个代理平台完全用 Python 编写,Web 界面则使用 Streamlit。下一节,我们将探讨如何构建 OpenAI LLM 聊天应用程序。

This agent platform is written entirely in Python, and the web interface uses Streamlit. In the next section, we look at how to build an OpenAI LLM chat application.

7.2介绍用于聊天应用程序开发的 Streamlit

7.2 Introducing Streamlit for chat application development

Streamlit 是一款快速而强大的 Web 界面原型设计工具,专为构建机器学习仪表板和概念而设计。它支持完全使用 Python 编写应用程序,并生成基于 React 的现代化 Web 界面。您甚至可以将完成的应用程序快速部署到云端或作为独立应用程序运行。

Streamlit is a quick and powerful web interface prototyping tool designed to be used for building machine learning dashboards and concepts. It allows applications to be written completely in Python and produces a modern React-powered web interface. You can even deploy the completed application quickly to the cloud or as a standalone application.

7.2.1构建 Streamlit 聊天应用程序

7.2.1 Building a Streamlit chat application

首先打开 Visual Studio Code (VS Code) 并进入chapter_07源文件夹。如果您已完成之前的练习,应该已经准备就绪。和往常一样,如果您在设置环境和工具方面需要帮助,请参阅附录 B。

Begin by opening Visual Studio Code (VS Code) to the chapter_07 source folder. If you’ve completed the previous exercise, you should already be ready. As always, if you need assistance setting up your environment and tools, refer to appendix B.

我们首先chatgpt_clone_response.py在 VS Code 中打开该文件。代码的顶部部分如清单 7.3 所示。这段代码使用 Streamlit 状态来加载主模型和消息。Streamlit 提供了一种机制来保存任何 Python 对象的会话状态。此状态仅为会话状态,会在用户关闭浏览器时过期。

We’ll start by opening the chatgpt_clone_response.py file in VS Code. The top section of the code is shown in listing 7.3. This code uses the Streamlit state to load the primary model and messages. Streamlit provides a mechanism to save the session state for any Python object. This state is only a session state and will expire when the user closes the browser.

列表 7.3 chatgpt_clone_response.py(顶部部分)
import streamlit as st
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()     #1

st.title("类似ChatGPT的克隆版")

client = OpenAI()     #2

如果“openai_model”不在 st.session_state 中:
    st.session_state["openai_model"]
             = "gpt-4-1106-preview"    #3

如果“messages”不在 st.session_state 中:
    st.session_state["messages"] = []  #4

for message in st.session_state["messages"]:     #5
    使用 st.chat_message(message["role"]):
        st.markdown(message["content"])
import streamlit as st
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()      #1

st.title("ChatGPT-like clone")

client = OpenAI()      #2

if "openai_model" not in st.session_state:
    st.session_state["openai_model"] 
             = "gpt-4-1106-preview"     #3

if "messages" not in st.session_state:
    st.session_state["messages"] = []   #4

for message in st.session_state["messages"]:      #5
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
#1 从 .env 文件加载环境变量

#2 配置 OpenAI 客户端

#3 检查内部会话状态中是否存在该设置,如果不存在则添加

#4 检查消息状态是否存在;如果不存在,则添加一个空列表

#5 遍历状态中的消息并显示它们

Streamlit 应用本身是无状态的。这意味着当网页刷新或用户选择操作时,整个 Python 脚本会重新执行所有界面组件。Streamlit 的状态机制提供了一种临时存储方式。当然,数据库需要支持更长期的存储。

The Streamlit app itself is stateless. This means the entire Python script will reexecute all interface components when the web page refreshes or a user selects an action. The Streamlit state allows for a temporary storage mechanism. Of course, a database needs to support more long-term storage.

用户界面控件和组件通过前缀加上元素名称来添加st.。Streamlit 支持多种标准用户界面控件,并支持图像、视频、音频以及聊天功能。

UI controls and components are added by using the st. prefix and then the element name. Streamlit supports several standard UI controls and supports images, video, sound, and, of course, chat.

继续向下滚动即可看到清单 7.4,其组件布局略微复杂一些。主if语句控制其余代码的运行。通过使用海象运算符(: =),提示符会设置为用户输入的内容。如果用户没有输入任何文本,则该语句下面的代码if不会执行。

Scrolling down further will yield listing 7.4, which has a slightly more complex layout of the components. The main if statement controls the running of the remaining code. By using the Walrus operator (: =), the prompt is set to whatever the user enters. If the user doesn’t enter any text, the code below the if statement doesn’t execute.

清单 7.4 chatgpt_clone_response.py(底部部分)
if prompt := st.chat_input("您需要什么?"):    #1
    st.session_state.messages.append({"role": "user", "content": prompt})
    使用 st.chat_message("user"):    #2
        st.markdown(提示)

    with st.spinner(text="助手正在思考..."):   #3
        使用 st.chat_message("assistant"):
            response = client.chat.completions.create(
                model=st.session_state["openai_model"],
                messages=[
                    {"角色": m["角色"], "内容": m["内容"]}
                    for m in st.session_state.messages
                ],     #4
            )
            response_content = response.choices[0].message.content
            response = st.markdown(response_content,
             unsafe_allow_html=True)     #5
    st.session_state.messages.append(
{"角色": "助理", "内容": response_content})     #6
if prompt := st.chat_input("What do you need?"):     #1
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):     #2
        st.markdown(prompt)

    with st.spinner(text="The assistant is thinking..."):    #3
        with st.chat_message("assistant"):
            response = client.chat.completions.create(
                model=st.session_state["openai_model"],
                messages=[
                    {"role": m["role"], "content": m["content"]}
                    for m in st.session_state.messages
                ],      #4
            )
            response_content = response.choices[0].message.content
            response = st.markdown(response_content,
             unsafe_allow_html=True)      #5
    st.session_state.messages.append(
{"role": "assistant", "content": response_content})      #6
#1 渲染聊天输入控件并设置内容。

#2 将聊天消息控件设置为以用户身份输出。

#3 显示加载指示器以表示长时间运行的 API 调用。

#4 调用 OpenAI API 并设置消息历史记录。

#5 将消息响应以 Markdown 格式写入界面。

#6 将助手响应添加到消息状态。

当用户在提示框中输入文本并按下回车键后,该文本会被添加到消息状态中,同时向 API 发出请求。在处理响应期间,控件st.spinner会显示提示信息,提醒用户该过程耗时较长。响应返回后,消息会被显示并添加到消息状态历史记录中。

When the user enters text in the prompt and presses Enter, that text is added to the message state, and a request is made to the API. As the response is being processed, the st.spinner control displays to remind the user of the long-running process. Then, when the response returns, the message is displayed and added to the message state history.

Streamlit 应用使用该模块运行,要调试应用,需要按照以下步骤将调试器附加到该模块:

Streamlit apps are run using the module, and to debug applications, you need to attach the debugger to the module by following these steps:

  1. 按 Ctrl-Shift-D 打开 VS Code 调试器。
  2. Press Ctrl-Shift-D to open the VS Code debugger.
  3. 点击链接创建新的启动配置,或点击齿轮图标显示当前配置。
  4. Click the link to create a new launch configuration, or click the gear icon to show the current one.
  5. 编辑或使用调试器配置工具来编辑该.vscode/launch.json文件,例如下一个例子所示。IntelliSense 提供了丰富的工具和配置选项,可以指导您完成此文件的设置。
  6. Edit or use the debugger configuration tools to edit the .vscode/launch.json file, like the one shown in the next listing. Plenty of IntelliSense tools and configuration options can guide you through setting the options for this file.
列表 7.5 .vscode/launch.json
{
  “版本”: “0.2.0,
  “配置”:[
    {
      "name": "Python 调试器:模块",    #1
      "type": "debugpy",
      "请求": "启动",
      模块:streamlit,    #2
      "args": ["run", "${file}"]   #3
    }
  ]
}
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python Debugger: Module",     #1
      "type": "debugpy",
      "request": "launch",
      "module": "streamlit",     #2
      "args": ["run", "${file}"]    #3
    }
  ]
}
#1 确保调试器设置为模块模式。

#2 确保模块是 Streamlit 模块。

#3 ${file} 是当前文件,或者您可以将其硬编码为文件路径。

launch.json文件配置完成后,保存文件,然后chatgpt_ clone_response.py在 VS Code 中打开该文件。现在,您可以按 F5 键以调试模式运行应用程序。这将从终端启动应用程序,几秒钟后,应用程序将显示出来。

After you have the launch.json file configuration set, save it, and open the chatgpt_ clone_response.py file in VS Code. You can now run the application in debug mode by pressing F5. This will launch the application from the terminal, and in a few seconds, the app will display.

图 7.5 显示了应用程序正在运行并等待响应。界面简洁现代,无需任何额外设置即可完成布局。您可以继续使用该界面与 LLM 聊天,然后刷新页面查看结果。

Figure 7.5 shows the app running and waiting to return a response. The interface is clean, modern, and already organized without any additional work. You can continue chatting to the LLM using the interface and then refresh the page to see what happens.

数字
图 7.5简单的界面和等待指示器

这个演示最令人印象深刻的是创建单页应用程序竟然如此简单。下一节,我们将继续研究这个应用程序,并对其进行一些改进。

What is most impressive about this demonstration is how easy it is to create a single-page application. In the next section, we’ll continue looking at this application but with a few enhancements.

7.2.2创建流媒体聊天应用程序

7.2.2 Creating a streaming chat application

诸如 ChatGPT 和 Gemini 之类的现代聊天应用通过使用流式传输来掩盖其模型的运行速度慢的问题。流式传输使得 API 调用能够立即获取 LLM 生成的令牌。这种流式传输体验还能让用户更好地参与到内容的生成过程中。

Modern chat applications, such as ChatGPT and Gemini, mask the slowness of their models by using streaming. Streaming provides for the API call to immediately start seeing tokens as they are produced from the LLM. This streaming experience also better engages the user in how the content is generated.

为任何应用程序的 UI 添加流媒体支持通常都不是一件容易的事,但幸运的是,Streamlit 提供了一个可以无缝集成的控件。在接下来的练习中,我们将了解如何更新应用程序以支持流媒体播放。

Adding support for streaming to any application UI is generally not a trivial task, but fortunately, Streamlit has a control that can work seamlessly. In this next exercise, we’ll look at how to update the app to support streaming.

在 VS Code 中打开chapter_7/chatgpt_clone_streaming.py。代码的相关更新如清单 7.6 所示。使用此st.write_stream控件可以让 UI 流式传输内容。这也意味着 Python 脚本会阻塞,等待此控件完成。

Open chapter_7/chatgpt_clone_streaming.py in VS Code. The relevant updates to the code are shown in listing 7.6. Using the st.write_stream control allows the UI to stream content. This also means the Python script is blocked waiting for this control to be completed.

清单 7.6 chatgpt_clone_streaming.py(相关章节)
使用 st.chat_message("assistant"):
    stream = client.chat.completions.create(
        model=st.session_state["openai_model"],
        messages=[
            {"角色": m["角色"], "内容": m["内容"]}
            for m in st.session_state.messages
        ],
        stream=True,    #1
    )
    response = st.write_stream(stream)    #2
st.session_state.messages.append(
{"角色": "助理", "内容": 回复})     #3
with st.chat_message("assistant"):
    stream = client.chat.completions.create(
        model=st.session_state["openai_model"],
        messages=[
            {"role": m["role"], "content": m["content"]}
            for m in st.session_state.messages
        ],
        stream=True,     #1
    )
    response = st.write_stream(stream)     #2
st.session_state.messages.append(
{"role": "assistant", "content": response})      #3
#1 将 stream 设置为 True 以在 API 上启动流式传输

#2 使用流控件将流写入接口

#3 流完成后,将响应添加到消息状态历史记录中

按 F5 键并等待页面加载完成,即可调试页面。输入查询语句后,您会看到响应实时显示在窗口中,如图 7.6 所示。加载指示器消失后,用户体验得到提升,响应速度也更快。

Debug the page by pressing F5 and waiting for the page to load. Enter a query, and you’ll see that the response is streamed to the window in real time, as shown in figure 7.6. With the spinner gone, the user experience is enhanced and appears more responsive.

数字
图 7.6更新后的界面,可流式传输文本响应

本节演示了使用 Streamlit 创建 Python Web 界面是多么简单。Nexus 使用 Streamlit 界面,因为它易于使用,并且仅使用 Python 即可进行修改。正如您将在下一节中看到的那样,它允许各种配置以支持更复杂的应用程序。

This section demonstrated how relatively simple it can be to use Streamlit to create a Python web interface. Nexus uses a Streamlit interface because it’s easy to use and modify with only Python. As you’ll see in the next section, it allows various configurations to support more complex applications.

7.3为代理人制定简介和角色画像

7.3 Developing profiles and personas for agents

Nexus 使用代理配置文件来描述代理的功能和能力。图 7.7 回顾了主要代理组件及其在本书中的结构。

Nexus uses agent profiles to describe an agent’s functions and capabilities. Figure 7.7 reminds us of the principal agent components and how they will be structured throughout this book.

数字
图 7.7代理配置文件与 YAML 文件定义的映射

截至撰写本文时,Nexus 仅支持配置文件中的“角色”和“操作”部分。图 7.7 显示了一个名为 Fritz 的配置文件及其角色和操作。要将任何代理配置文件添加到 Nexus,只需将代理 YAML 配置文件复制到相应Nexus/ nexus/nexus_base/nexus_profiles文件夹即可。

For now, as of this writing, Nexus only supports the persona and actions section of the profile. Figure 7.7 shows a profile called Fritz, along with the persona and actions. Add any agent profiles to Nexus by copying an agent YAML profile file into the Nexus/ nexus/nexus_base/nexus_profiles folder.

Nexus 使用插件系统动态发现放置在各自文件夹中的各种组件和配置文件。该nexus_profiles文件夹包含代理的 YAML 定义。

Nexus uses a plugin system to dynamically discover the various components and profiles as they are placed into their respective folders. The nexus_profiles folder holds the YAML definitions for the agent.

我们可以通过在文件夹中创建一个新的 YAML 文件来轻松定义一个新的代理配置文件nexus_ profiles。清单 7.7 展示了一个略微更新了角色信息的新配置文件示例。要跟随操作,请确保已打开 VS Code 并跳转到源代码文件夹,且已在开发者模式下安装 Nexus(参见清单 7.7)。然后,在文件夹中chapter_07创建该文件。fiona.yamlNexus/nexus/nexus_base/nexus_profiles

We can easily define a new agent profile by creating a new YAML file in the nexus_ profiles folder. Listing 7.7 shows an example of a new profile with a slightly updated persona. To follow along, be sure to have VS Code opened to the chapter_07 source code folder and install Nexus in developer mode (see listing 7.7). Then, create the fiona.yaml file in the Nexus/nexus/nexus_base/nexus_profiles folder.

清单 7.7 fiona.yaml(创建此文件)
代理配置文件:
  姓名:菲诺娜
  头像:“👹”    #1
  角色描述:“你是一个非常健谈的人工智能,
对食人魔的一切都了如指掌 
你总是用晦涩难懂的食人魔语言回答问题。   #2
  操作:
    - 搜索维基百科    #3
  知识:无       #4
  内存:空 #4
  评估者:null #4
  规划者:null #4
  反馈:null #4
agentProfile:
  name: "Finona"
  avatar: "👹"     #1
  persona: "You are a very talkative AI that 
 knows and understands everything in terms of 
 Ogres. You always answer in cryptic Ogre speak."    #2
  actions:
    - search_wikipedia     #3
  knowledge: null        #4
  memory: null           #4
  evaluators: null       #4
  planners: null         #4
  feedback: null         #4
#1 用于代表角色的文本头像

#2 角色代表基础系统提示。

#3 代理可以使用的操作功能

#4 目前不支持

保存文件后,您可以从命令行启动 Nexus,或者通过在.vscode/launch.json文件夹中创建一个新的启动配置来以调试模式运行它,如下一个示例所示。然后,保存文件并将调试配置切换到使用 Nexus Web 配置。

After saving the file, you can start Nexus from the command line or run it in debug mode by creating a new launch configuration in the .vscode/launch.json folder, as shown in the next listing. Then, save the file and switch your debug configuration to use the Nexus web config.

清单 7.8 .vscode/launch.json(添加调试启动)
{
      "name": "Python Debugger: Nexus Web",
      "type": "debugpy",
      "请求": "启动",
      模块:streamlit,
      "args": ["run", "Nexus/nexus/streamlit_ui.py"]     #1
    },
{
      "name": "Python Debugger: Nexus Web",
      "type": "debugpy",
      "request": "launch",
      "module": "streamlit",
      "args": ["run", " Nexus/nexus/streamlit_ui.py"]      #1
    },
#1 如果您的虚拟环境不同,您可能需要调整此路径。

按下 F5 键或从菜单中选择“运行”>“开始调试”,Streamlit Nexus 界面将会启动。请以调试模式运行 Nexus。启动后,创建一个新线程,然后选择标准的 OpenAIAgent 和您新建的角色,如图 7.8 所示。

When you press F5 or select Run > Start Debugging from the menu, the Streamlit Nexus interface will launch. Go ahead and run Nexus in debug mode. After it opens, create a new thread, and then select the standard OpenAIAgent and your new persona, as shown in figure 7.8.

数字
图 7.8选择并与新角色聊天

此时,角色配置文件负责定义代理的系统提示。如图 7.8 所示,我们让 Finona 拼写单词clock,她用某种类似食人魔语言的方式回答了。在这个例子中,我们将角色用作个性,但正如我们之前看到的,系统提示也可以包含规则和其他选项。

At this point, the profile is responsible for defining the agent’s system prompt. You can see this in figure 7.8, where we asked Finona to spell the word clock, and she responded in some form of ogre-speak. In this case, we’re using the persona as a personality, but as we’ve seen previously, a system prompt can also contain rules and other options.

用户画像和角色是智能体与用户或其他系统交互方式的基本定义。用户画像的实现需要智能体引擎。下一节我们将介绍智能体引擎的基本实现。

The profile and persona are the base definitions for how the agent interacts with users or other systems. Powering the profile requires an agent engine. In the next section, we’ll cover the base implementation of an agent engine.

7.4为代理提供动力并了解代理引擎

7.4 Powering the agent and understanding the agent engine

Nexus 中的代理引擎为代理提供支持。这些引擎可以与特定的工具平台(例如 SK)以及不同的生命周期模型(LLM,例如 Anthropic Claude 或 Google Gemini)集成。通过提供基础代理抽象,Nexus 应该能够支持现在和将来任何工具或模型。

Agent engines power agents within Nexus. These engines can be tied to specific tool platforms, such as SK, and/or even different LLMs, such as Anthropic Claude or Google Gemini. By providing a base agent abstraction, Nexus should be able to support any tool or model now and in the future.

agent_manager.py目前,Nexus 仅实现了一个基于 OpenAI API 的智能体。我们将通过打开文件夹中的文件来了解基础智能体的定义Nexus/ nexus/nexus_base

Currently, Nexus only implements an OpenAI API–powered agent. We’ll look at how the base agent is defined by opening the agent_manager.py file from the Nexus/ nexus/nexus_base folder.

清单 7.9 展示了BaseAgent类的功能。创建新的代理引擎时,需要继承此类并使用相应的实现来实现各种工具/操作。

Listing 7.9 shows the BaseAgent class functions. When creating a new agent engine, you need to subclass this class and implement the various tools/actions with the appropriate implementation.

列表 7.9 agent_manager.py:BaseAgent
类 BaseAgent:
    def __init__(self, chat_history=None):
        self._chat_history = chat_history 或 []
        self.last_message = ""
        self._actions = []
        self._profile = None

    async def get_response(self,
                            用户输入,
                            thread_id=None):     #1
        raise NotImplementedError("此方法应该实现…")

    async def get_semantic_response(self,
                                     迅速的,
                                     thread_id=None):    #2
        raise NotImplementedError("此方法应该…")

    def get_response_stream(self,
                             用户输入,
                             thread_id=None):     #3
        raise NotImplementedError("此方法应该…")

    def append_chat_history(self,
                             thread_id,
                             用户输入,
                             回复):     #4
        self._chat_history.append(
            {"角色": "用户",
             "内容": 用户输入,
             "thread_id": thread_id}
        )
        self._chat_history.append(
            {"role": "机器人",
             “内容”:响应,
             "thread_id": thread_id}
        )

    def load_chat_history(self):      #5
        引发 NotImplementedError(
                 “应该实施这种方法……”

    def load_actions(self):    #6
        引发 NotImplementedError(
                 “应该实施这种方法……”

#...未显示 – 属性设置器/获取器
class BaseAgent:
    def __init__(self, chat_history=None):
        self._chat_history = chat_history or []
        self.last_message = ""
        self._actions = []
        self._profile = None

    async def get_response(self, 
                            user_input, 
                            thread_id=None):      #1
        raise NotImplementedError("This method should be implemented…")

    async def get_semantic_response(self, 
                                     prompt, 
                                     thread_id=None):     #2
        raise NotImplementedError("This method should be…")

    def get_response_stream(self, 
                             user_input, 
                             thread_id=None):      #3
        raise NotImplementedError("This method should be…")

    def append_chat_history(self, 
                             thread_id, 
                             user_input, 
                             response):      #4
        self._chat_history.append(
            {"role": "user",
             "content": user_input,
             "thread_id": thread_id}
        )
        self._chat_history.append(
            {"role": "bot",
             "content": response, 
             "thread_id": thread_id}
        )

    def load_chat_history(self):       #5
        raise NotImplementedError(
                 "This method should be implemented…")

    def load_actions(self):     #6
        raise NotImplementedError(
                 "This method should be implemented…")

#... not shown – property setters/getters
#1 调用 LLM 并返回响应

#2 执行语义函数

#3 调用 LLM 并返回响应

#4 将消息追加到代理的内部聊天记录

#5 加载聊天记录并允许代理重新加载各种记录

#6 加载代理可用的操作

在 VS Code 中打开文件。清单 7.10 展示了直接调用 OpenAI API 的函数nexus_agents/oai_agent.py的代理引擎实现。` <client>` 是之前在类初始化期间创建的 OpenAI 客户端,其余代码已在之前的示例中使用。get_responseself.client

Open the nexus_agents/oai_agent.py file in VS Code. Listing 7.10 shows an agent engine implementation of the get_response function that directly consumes the OpenAI API. self.client is an OpenAI client created earlier during class initialization, and the rest of the code you’ve seen used in earlier examples.

清单 7.10 oai_agent.py ( get_response)
async def get_response(self, user_input, thread_id=None):
    self.messages += [{"role": "user",
                     "内容": 用户输入}]     #1
    response = self.client.chat.completions.create(    #2
        model=self.model,
        messages=self.messages,
        温度=0.7,     #3
    )
    self.last_message = str(response.choices[0].message.content)
    返回 self.last_message    #4
async def get_response(self, user_input, thread_id=None):
    self.messages += [{"role": "user",
                     "content": user_input}]      #1
    response = self.client.chat.completions.create(     #2
        model=self.model,
        messages=self.messages,
        temperature=0.7,      #3
    )
    self.last_message = str(response.choices[0].message.content)
    return self.last_message     #4
#1 将用户输入添加到消息堆栈。

#2 客户端已创建,现在用于创建聊天补全。

#3 温度是硬编码的,但可以配置。

#4 返回聊天补全调用的响应。

与代理配置文件类似,Nexus 使用插件系统,允许您将新的代理引擎定义放置在nexus_agents文件夹中。如果您创建了代理,只需将其放置在此文件夹中,Nexus 即可发现它。

Like the agent profiles, Nexus uses a plugin system that allows you to place new agent engine definitions in the nexus_agents folder. If you create your agent, it just needs to be placed in this folder for Nexus to discover.

我们无需运行示例,因为我们已经了解了 OpenAIAgent 的运行方式。下一节,我们将探讨代理可以开发、添加和使用的代理函数。

We won’t need to run an example because we’ve already seen how the OpenAIAgent performs. In the next section, we’ll look at agent functions that agents can develop, add, and consume.

7.5赋予代理操作和工具

7.5 Giving an agent actions and tools

与 SK 类似,Nexus 也支持原生(代码)函数和语义(提示)函数。但与 SK 不同的是,在 Nexus 中定义和使用函数更加便捷。您只需将函数编写成 Python 文件,然后将其放入nexus_ actions相应的文件夹即可。

Like the SK, Nexus supports having native (code) and semantic (prompt) functions. Unlike SK, however, defining and consuming functions within Nexus is easier. All you need to do is write functions into a Python file and place them into the nexus_ actions folder.

要了解定义函数有多么简单,请打开Nexus/nexus/nexus_base/ nexus_actions文件夹并找到该test_actions.py文件。清单 7.11 展示了两个函数定义。第一个函数是一个简单的代码/原生函数示例,第二个函数是提示/语义函数。

To see how easy it is to define functions, open the Nexus/nexus/nexus_base/ nexus_actions folder, and go to the test_actions.py file. Listing 7.11 shows two function definitions. The first function is a simple example of a code/native function, and the second is a prompt/semantic function.

列表 7.11 test_actions.py(原生/语义函数定义)
from nexus.nexus_base.action_manager import agent_action


@agent_action                                             #1
def get_current_weather(location, unit="fahrenheit"): #1
    获取指定地点的当前天气     #2
    返回 f"""
{location} 当前天气为 0 {unit}。
"""     #3


@agent_action     #4
def recommend(topic):
    """
    系统:                                                  #5
        请针对给定的 {{topic}} 提供建议。
        请您根据实际情况提出建议。
    用户:
        请自行斟酌。
        针对 {{topic}} 提供建议。#5
    """
    经过     #6
from nexus.nexus_base.action_manager import agent_action


@agent_action                                              #1
def get_current_weather(location, unit="fahrenheit"):     #1
    """Get the current weather in a given location"""      #2
    return f"""
The current weather in {location} is 0 {unit}.
"""      #3


@agent_action      #4
def recommend(topic):
    """
    System:                                                   #5
        Provide a recommendation for a given {{topic}}.
        Use your best judgment to provide a recommendation.
    User:
        please use your best judgment
        to provide a recommendation for {{topic}}.           #5
    """
    pass      #6
#1 应用 agent_action 装饰器将函数变成一个动作。

#2 为函数设置描述性注释。

#3 代码可以根据需要变得简单或复杂。

#4 应用 agent_action 装饰器将函数变成一个动作。

#5 函数注释会成为提示,并且可以包含占位符。

#6 语义函数不实现任何代码。

将这两个函数都放在同nexus_actions一个文件夹中,它们将被自动发现。添加agent_action装饰器后,程序可以检查这些函数,并自动生成 OpenAI 标准工具规范。LLM 随后可以使用此工具规范进行工具使用和函数调用。

Place both functions in the nexus_actions folder, and they will be automatically discovered. Adding the agent_action decorator allows the functions to be inspected and automatically generates the OpenAI standard tool specification. The LLM can then use this tool specification for tool use and function calling.

清单 7.12 显示了为这两个函数生成的 OpenAI 工具规范,如清单 7.11 所示。使用提示的语义函数也适用于工具描述。此工具描述将发送给 LLM,以确定要调用哪个函数。

Listing 7.12 shows the generated OpenAI tool specification for both functions, as shown previously in listing 7.11. The semantic function, which uses a prompt, also applies to the tool description. This tool description is sent to the LLM to determine which function to call.

清单 7.12 test_actions:OpenAI 生成的工具规范
{
    "type": "function",
    “功能”: {
        "name": "get_current_weather",
        “描述”:
        “获取指定地点的当前天气情况”,   #1
        “参数”: {
            "type": "object",
            “特性”: {     #2
                “地点”: {
                    "type": "string",
                    描述:位置
                },
                “单元”: {
                    "type": "string",
                    "枚举": [
                        “摄氏度”,
                        华氏度
                    ]
                }
            },
            “必需的”: [
                “地点”
            ]
        }
    }
}
{
    "type": "function",
    “功能”: {
        "姓名": "推荐",
        “描述”: ”””
    系统:
    请针对给定的 {{topic}} 提供建议。
请您根据实际情况提出建议。
用户:
请自行斟酌。
为 {{topic}} 提供建议。""",     #3
        “参数”: {
            "type": "object",
            “特性”: {      #4
                “话题”: {
                    "type": "string",
                    描述:主题
                }
            },
            “必需的”: [
                “话题”
            ]
        }
    }
}
{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": 
        "Get the current weather in a given location",    #1
        "parameters": {
            "type": "object",
            "properties": {      #2
                "location": {
                    "type": "string",
                    "description": "location"
                },
                "unit": {
                    "type": "string",
                    "enum": [
                        "celsius",
                        "fahrenheit"
                    ]
                }
            },
            "required": [
                "location"
            ]
        }
    }
}
{
    "type": "function",
    "function": {
        "name": "recommend",
        "description": """
    System:
    Provide a recommendation for a given {{topic}}.
Use your best judgment to provide a recommendation.
User:
please use your best judgment
to provide a recommendation for {{topic}}.""",      #3
        "parameters": {
            "type": "object",
            "properties": {       #4
                "topic": {
                    "type": "string",
                    "description": "topic"
                }
            },
            "required": [
                "topic"
            ]
        }
    }
}
#1 函数注释成为函数工具描述。

#2 提取函数的输入参数并添加到规范中。

#3 函数注释成为函数工具描述。

#4 提取函数的输入参数并添加到规范中。

代理引擎也需要实现这种能力,才能执行各种函数和其他组件。OpenAI 代理已经实现了并行函数调用功能。其他代理引擎的实现则需要支持各自版本的操作使用方式。幸运的是,OpenAI 工具的定义正在逐渐成为标准,许多平台都遵循这一标准。

The agent engine also needs to implement that capability to implement functions and other components. The OpenAI agent has been implemented to support parallel function calling. Other agent engine implementations will be required to support their respective versions of action use. Fortunately, the definition of the OpenAI tool is becoming the standard, and many platforms adhere to this standard.

在深入演示工具使用方法之前,让我们先通过oai_agent.py在 VS Code 中打开文件来观察 OpenAI 代理是如何实现动作的。以下代码清单展示了代理函数的顶部get_response_stream及其函数调用实现。

Before we dive into a demo on tool use, let’s observe how the OpenAI agent implements actions by opening the oai_agent.py file in VS Code. The following listing shows the top of the agent’s get_response_stream function and its implementation of function calling.

清单 7.13调用 APIget_response_stream
def get_response_stream(self, user_input, thread_id=None):
    self.last_message = ""
    self.messages += [{"role": "user", "content": user_input}]
    如果 self.tools 存在且 len(self.tools) > 0:   #1
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            tools=self.tools,     #2
            tool_choice="auto",     #3
        )
    别的:    #4
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
        )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls    #5
def get_response_stream(self, user_input, thread_id=None):
    self.last_message = ""
    self.messages += [{"role": "user", "content": user_input}]
    if self.tools and len(self.tools) > 0:    #1
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            tools=self.tools,      #2
            tool_choice="auto",      #3
        )
    else:     #4
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
        )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls     #5
#1 检测代理是否启用了任何可用工具

#2 在聊天完成调用中设置工具

#3 确保 LLM 知道它可以选择任何工具

#4 如果没有工具,则按标准方式调用 LLM

#5 检测 LLM 是否使用了任何工具

接下来执行函数,如清单 7.14 所示。这段代码演示了智能体如何支持并行函数/工具调用。这些调用之所以并行,是因为智能体会同时执行每个调用,且顺序无关紧要。在第 11 章中,我们将探讨允许按顺序调用动作的规划器。

Executing the functions follows, as shown in listing 7.14. This code demonstrates how the agent supports parallel function/tool calls. These calls are parallel because the agent executes each one together and in no order. In chapter 11, we’ll look at planners that allow actions to be called in ordered sequences.

清单 7.14 oai_agent.pyget_response_stream:执行工具调用)
如果 tool_calls:    #1
    可用函数 = {
        action["name"]: action["pointer"] for action in self.actions
    }    #2
    self.messages.append(
        响应消息
    )
    对于 tool_calls 中的每个 tool_call:    #3
        函数名 = tool_call.function.name
        要调用的函数 = 可用函数[函数名]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            **function_args, _caller_agent=self
        )

        self.messages.append(
            {
                "tool_call_id": tool_call.id,
                "角色": "工具",
                "name": function_name,
                "content": str(function_response),
            }
        )
    second_response = self.client.chat.completions.create(
        model=self.model,
        messages=self.messages,
    )     #4
    response_message = second_response.choices[0].message
if tool_calls:     #1
    available_functions = {
        action["name"]: action["pointer"] for action in self.actions
    }     #2
    self.messages.append(
        response_message
    )
    for tool_call in tool_calls:     #3
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            **function_args, _caller_agent=self
        )

        self.messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": str(function_response),
            }
        )
    second_response = self.client.chat.completions.create(
        model=self.model,
        messages=self.messages,
    )      #4
    response_message = second_response.choices[0].message
#1 如果在 LLM 响应中检测到工具调用,则继续执行。

#2 加载指向实际函数实现的指针以执行代码。

#3 循环遍历 LLM 要调用的所有函数;可能存在多个函数调用。

#4 使用工具调用的结果执行第二次 LLM 调用。

为了演示这一点,请按 F5 键在调试器中启动 Nexus。然后,选择两个测试操作——recommendget_current_weather——以及简洁的角色/配置文件 Olly。图 7.9 显示了输入查询后,代理使用这两个工具进行响应的结果。

To demo this, start up Nexus in the debugger by pressing F5. Then, select the two test actions—recommend and get_current_weather—and the terse persona/profile Olly. Figure 7.9 shows the result of entering a query and the agent responding by using both tools in its response.

数字
图 7.9智能体如何并行使用工具并以单一响应做出反应

如果您需要更详细地了解这些代理操作的工作原理,请参阅第 5 章。底层代码更为复杂,超出了本文的讨论范围。不过,您可以查看 Nexus 代码,以便更好地理解各个部分是如何连接的。

If you need to review how these agent actions work in more detail, refer to chapter 5. The underlying code is more complex and out of the scope of review here. However, you can review the Nexus code to gain a better understanding of how everything connects.

现在,您可以继续在 Nexus 中体验各种代理选项。例如,尝试选择具有其他功能的不同配置文件/角色。在下一章中,我们将揭示代理如何使用检索增强生成 (RAG) 等模式来获取外部记忆和知识。

Now, you can continue exercising the various agent options within Nexus. Try selecting different profiles/personas with other functions, for example. In the next chapter, we unveil how agents can consume external memory and knowledge using patterns such as Retrieval Augmented Generation (RAG).

7.6练习

7.6 Exercises

通过以下练习来加深你对相关内容的理解:

Use the following exercises to improve your knowledge of the material:

  • 练习 1 —探索 Streamlit 基础知识(简单)
  • Exercise 1—Explore Streamlit Basics (Easy)

目标 ——通过创建一个简单的 Web 应用程序来熟悉 Streamlit,该应用程序显示用户输入的文本。

Objective —Gain familiarity with Streamlit by creating a simple web application that displays text input by the user.

任务:

Tasks:

    • 按照 Streamlit 文档设置基本应用程序。
    • 添加一个文本输入框和一个按钮。点击按钮后,将用户输入的文本显示在屏幕上。
    • Follow the Streamlit documentation to set up a basic application.
    • Add a text input and a button. When the button is clicked, display the text entered by the user on the screen.
  • 练习 2 — 创建基本代理配置文件
  • Exercise 2—Create a Basic Agent Profile

目标 ——了解在 Nexus 中创建和应用代理配置文件的过程。

Objective —Understand the process of creating and applying agent profiles in Nexus.

任务:

Tasks:

    • 创建一个具有独特个性的全新代理人档案。该个性应具有特定的主题或特征(例如,历史学家)。
    • 定义一套符合该用户画像的基本回复。
    • 通过 Nexus 界面与角色进行交互来测试角色。
    • Create a new agent profile with a unique persona. This persona should have a specific theme or characteristic (e.g., a historian).
    • Define a basic set of responses that align with this persona.
    • Test the persona by interacting with it through the Nexus interface.
  • 练习 3 — 开发自定义操作
  • Exercise 3—Develop a Custom Action

目标 ——学习如何通过开发自定义操作来扩展 Nexus 的功能。

Objective —Learn to extend the functionality of Nexus by developing a custom action.

任务:

Tasks:

    • fetch_current_news开发一个与模拟 API 集成的新操作(例如),以检索最新的新闻标题。
    • 将此操作实现为原生(代码)函数和语义(基于提示)函数。
    • 在 Nexus 环境中测试该操作,以确保其按预期工作。
    • Develop a new action (e.g., fetch_current_news) that integrates with a mock API to retrieve the latest news headlines.
    • Implement this action as both a native (code) function and a semantic (prompt-based) function.
    • Test the action in the Nexus environment to ensure it works as expected.
  • 练习 4 —集成第三方 API
  • Exercise 4 —Integrate a Third-Party API

目标 ——通过集成真正的第三方 API 来增强 Nexus 代理的功能。

Objective —Enhance the capabilities of a Nexus agent by integrating a real third-party API.

任务:

Tasks:

    • 选择一个公共 API(例如,天气或新闻 API),并创建一个新操作,从该 API 获取数据。
    • 加入错误处理机制,确保代理能够优雅地处理 API 故障或意外响应。
    • 在 Nexus 中彻底测试集成效果。
    • Choose a public API (e.g., weather or news API), and create a new action that fetches data from this API.
    • Incorporate error handling and ensure that the agent can gracefully handle API failures or unexpected responses.
    • Test the integration thoroughly within Nexus.

概括

Summary

  • Nexus 是一个开源的智能体开发平台,与本书配合使用。它旨在开发、测试和托管 AI 智能体,并基于 Streamlit 构建,用于创建交互式仪表板和聊天界面。
  • Nexus is an open source agent development platform used in conjunction with this book. It’s designed to develop, test, and host AI agents and is built on Streamlit for creating interactive dashboards and chat interfaces.
  • Streamlit 是一个 Python Web 应用框架,能够快速开发用户友好的仪表盘和聊天应用。该框架简化了用户探索和使用各种代理功能的过程。
  • Streamlit, a Python web application framework, enables the rapid development of user-friendly dashboards and chat applications. This framework facilitates the exploration and interaction with various agent features in a streamlined manner.
  • Nexus支持创建和自定义代理档案和角色,允许用户定义代理的个性和行为。这些档案决定了代理如何与用户互动以及如何响应用户输入。
  • Nexus supports creating and customizing agent profiles and personas, allowing users to define their agents’ personalities and behaviors. These profiles dictate how agents interact with and respond to user inputs.
  • Nexus平台允许在智能体中开发和集成语义化(基于提示)和原生(基于代码)的操作和工具。这使得创建功能强大且响应迅速的智能体成为可能。
  • The Nexus platform allows for developing and integrating semantic (prompt-based) and native (code-based) actions and tools within agents. This enables the creation of highly functional and responsive agents.
  • 作为开源平台,Nexus 的设计旨在实现可扩展性,鼓励社区贡献代码并添加新功能、工具和代理功能。
  • As an open source platform, Nexus is designed to be extensible, encouraging contributions and the addition of new features, tools, and agent capabilities by the community.
  • Nexus 具有灵活性,支持各种部署选项,包括 Web 界面、API 和未来版本中的 Discord 机器人,可满足各种开发和测试需求。
  • Nexus is flexible, supporting various deployment options, including a web interface, API, and a Discord bot in future iterations, accommodating a wide range of development and testing needs.

8 理解智能体的记忆和知识

8 Understanding agent memory and knowledge

本章内容

This chapter covers

  • 人工智能功能中的知识/记忆检索
  • Retrieval in knowledge/memory in AI functions
  • 使用 LangChain 构建检索增强生成工作流
  • Building retrieval augmented generation workflows with LangChain
  • Nexus 中用于智能体知识系统的检索增强生成
  • Retrieval augmented generation for agentic knowledge systems in Nexus
  • 智能体记忆的检索模式
  • Retrieval patterns for memory in agents
  • 利用内存和知识压缩改进增强检索系统
  • Improving augmented retrieval systems with memory and knowledge compression

现在我们已经探索了如何使用外部工具(例如原生或语义函数形式的插件)来执行代理操作,接下来我们可以研究记忆和知识在代理和聊天界面检索中的作用。我们将描述记忆和知识的概念,以及它们与提示工程策略的关系。然后,为了理解记忆和知识,我们将研究文档索引,使用 LangChain 构建检索系统,在 LangChain 中使用记忆,并使用 Nexus 构建语义记忆。

Now that we’ve explored agent actions using external tools, such as plugins in the form of native or semantic functions, we can look at the role of memory and knowledge using retrieval in agents and chat interfaces. We’ll describe memory and knowledge and how they relate to prompt engineering strategies, and then, to understand memory knowledge, we’ll investigate document indexing, construct retrieval systems with LangChain, use memory with LangChain, and build semantic memory using Nexus.

8.1理解人工智能应用中的检索

8.1 Understanding retrieval in AI applications

在代理和聊天应用程序中,检索是一种获取知识并将其存储在通常为外部且长期保存的存储系统中的机制。非结构化知识包括对话或任务历史记录、事实、偏好或其他用于理解提示语上下文的信息。结构化知识通常存储在数据库或文件中,可通过原生函数或插件访问。

Retrieval in agent and chat applications is a mechanism for obtaining knowledge to keep in storage that is typically external and long-lived. Unstructured knowledge includes conversation or task histories, facts, preferences, or other items necessary for contextualizing a prompt. Structured knowledge, typically stored in databases or files, is accessed through native functions or plugins.

如图 8.1 所示,记忆和知识是用于为提示添加更多上下文和相关信息的要素。提示可以包含各种信息,例如文档信息、之前的任务或对话以及其他参考资料。

Memory and knowledge, as shown in figure 8.1, are elements used to add further context and relevant information to a prompt. Prompts can be augmented with everything from information about a document to previous tasks or conversations and other reference information.

数字
图 8.1使用以下提示工程策略对提示进行记忆、检索和增强:使用外部工具和提供参考文本。

图 8.1 所示的提示工程策略可应用于记忆和知识。知识并非记忆,而是对现有文档提示的扩展。知识和记忆都以检索为基础,用于查询非结构化信息。

The prompt engineering strategies shown in figure 8.1 can be applied to memory and knowledge. Knowledge isn’t considered memory but rather an augmentation of the prompt from existing documents. Both knowledge and memory use retrieval as the basis for how unstructured information can be queried.

这种名为检索增强生成(RAG)的检索机制已成为为提示提供相关上下文的标准方法。RAG 的运作机制与记忆/知识的运作机制相同,因此理解其工作原理至关重要。下一节我们将探讨 RAG 的本质。

The retrieval mechanism, called retrieval augmented generation (RAG), has become a standard for providing relevant context to a prompt. The exact mechanism that powers RAG also powers memory/knowledge, and it’s essential to understand how it works. In the next section, we’ll examine what RAG is.

8.2检索增强生成(RAG)的基础知识

8.2 The basics of retrieval augmented generation (RAG)

RAG 已成为支持文档聊天或问答聊天的一种常用机制。该系统的工作原理通常是:用户提供相关文档(例如 PDF),然后使用 RAG 和大型语言模型 (LLM) 来查询该文档。

RAG has become a popular mechanism for supporting document chat or question-and-answer chat. The system typically works by a user supplying a relevant document, such as a PDF, and then using RAG and a large language model (LLM) to query the document.

图 8.2 展示了 RAG 如何允许使用 LLM 查询文档。在查询任何文档之前,必须先将其加载、转换为上下文块、嵌入到向量中,并存储在向量数据库中。

Figure 8.2 shows how RAG can allow a document to be queried using an LLM. Before any document can be queried, it must first be loaded, transformed into context chunks, embedded into vectors, and stored in a vector database.

数字
图 8.2 RAG 的两个阶段:首先,必须加载、转换、嵌入和存储文档;其次,可以使用增强生成来查询文档。

用户可以通过提交查询来查询之前已索引的文档。该查询随后会被嵌入到向量表示中,以便在向量数据库中搜索相似的数据块。与查询相似的内容会被用作上下文信息,并填充到提示信息中作为增强信息。该提示信息会被推送至逻辑逻辑模型(LLM),LLM 可以利用上下文信息来帮助回答查询。

A user can query previously indexed documents by submitting a query. That query is then embedded into a vector representation to search for similar chunks in the vector database. Content similar to the query is then used as context and populated into the prompt as augmentation. The prompt is pushed to an LLM, which can use the context information to help answer the query.

非结构化记忆/知识概念依赖于某种文本相似性搜索方式,其检索模式如图 8.2 所示。图 8.3 展示了记忆如何使用相同的嵌入和向量数据库组件。与预加载文档不同,对话或对话片段被嵌入并保存到向量数据库中。

Unstructured memory/knowledge concepts rely on some format of text-similarity search following the retrieval pattern shown in figure 8.2. Figure 8.3 shows how memory uses the same embedding and vector database components. Rather than preload documents, conversations or parts of a conversation are embedded and saved to a vector database.

数字
图 8.3增强生成的记忆检索使用相同的嵌入模式将项目索引到向量数据库中。

检索模式和文档索引非常复杂,需要仔细考虑才能成功运用。这就要求我们理解数据的存储和检索方式,我们将在下一节中详细阐述。

The retrieval pattern and document indexing are nuanced and require careful consideration to be employed successfully. This requires understanding how data is stored and retrieved, which we’ll start to unfold in the next section.

8.3深入探讨语义搜索和文档索引

8.3 Delving into semantic search and document indexing

文档索引是对文档信息进行转换,使其更容易被检索。索引的查询或搜索方式也至关重要,无论是搜索特定的关键词组合,还是逐句匹配。

Document indexing transforms a document’s information to be more easily recovered. How the index will be queried or searched also plays a factor, whether searching for a particular set of words or wanting to match phrase for phrase.

语义搜索是指根据词语及其含义,查找与搜索词组相匹配的内容。这种基于语义的搜索能力非常强大,值得深入研究。下一节,我们将探讨向量相似度搜索如何为语义搜索构建框架。

A semantic search is a search for content that matches the searched phrase by words and meaning. The ability to search by meaning, semantically, is potent and worth investigating in some detail. In the next section, we look at how vector similarity search can lay the framework for semantic search.

8.3.1应用向量相似性搜索

8.3.1 Applying vector similarity search

现在我们来看看如何将文档转换为语义向量,也就是文本的一种表示形式,这种表示形式可以用来进行距离或相似度匹配。将文本转换为语义向量的方法有很多种,所以我们只介绍一种简单的方法。

Let’s look now at how a document can be transformed into a semantic vector, or a representation of text that can then be used to perform distance or similarity matching. There are numerous ways to convert text into a semantic vector, so we’ll look at a simple one.

chapter_08在新的 Visual Studio Code (VS Code) 工作区中打开该文件夹。创建一个新的环境,并pip install创建requirements.txt包含本章所有依赖项的文件。如果您需要帮助设置新的 Python 环境,请参阅附录 B。

Open the chapter_08 folder in a new Visual Studio Code (VS Code) workspace. Create a new environment and pip install the requirements.txt file for all the chapter dependencies. If you need help setting up a new Python environment, consult appendix B.

现在在 VS Code 中打开该document_vector_similarity.py文件,并查看清单 8.1 中的顶部部分。此示例使用词频-逆文档频率 (TF-IDF)。该数值统计量反映了一个词在文档集合中的重要性,其值与该词在文档中出现的次数成正比,并扣除该词在文档集中的频率。TF-IDF 是衡量单个文档在文档集中重要性的经典方法。

Now open the document_vector_similarity.py file in VS Code, and review the top section in listing 8.1. This example uses Term Frequency–Inverse Document Frequency (TF–IDF). This numerical statistic reflects how important a word is to a document in a collection or set of documents by increasing proportionally to the number of times a word appears in the document and offset by the frequency of the word in the document set. TF–IDF is a classic measure of understanding one document’s importance within a set of documents.

列表 8.1 document_vector_similarity(转换为向量)
import plotly.graph_objects as go
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

文档 = [     #1
    天空湛蓝美丽。
    “我太喜欢这片湛蓝美丽的天空了!”
    “敏捷的棕色狐狸跳过了懒惰的狗。”
    “国王的早餐包括香肠、火腿、培根、鸡蛋、吐司和豆子。”
    “我喜欢绿鸡蛋、火腿、香肠和培根!”
    “棕色的狐狸动作敏捷,蓝色的狗很懒惰!”
    “今天的天空非常蓝,非常美丽。”
    “狗很懒,但棕狐很敏捷!”
]

vectorizer = TfidfVectorizer()    #2
X = vectorizer.fit_transform(documents)     #3
import plotly.graph_objects as go
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

documents = [      #1
    "The sky is blue and beautiful.",
    "Love this blue and beautiful sky!",
    "The quick brown fox jumps over the lazy dog.",
    "A king's breakfast has sausages, ham, bacon, eggs, toast, and beans",
    "I love green eggs, ham, sausages and bacon!",
    "The brown fox is quick and the blue dog is lazy!",
    "The sky is very blue and the sky is very beautiful today",
    "The dog is lazy but the brown fox is quick!"
]

vectorizer = TfidfVectorizer()     #2
X = vectorizer.fit_transform(documents)      #3
#1 文档样​​本

#2 使用 TF-IDF 进行向量化

#3 对文档进行向量化。

让我们以例句“天空是蓝色的,很美丽”为例,将 TF-IDF 分解成两个组成部分,并重点关注单词“蓝色”

Let’s break down TF–IDF into its two components using the sample sentence, “The sky is blue and beautiful,” and focusing on the word blue.

词频(TF)

Term Frequency (TF)

词频 (TF) 衡量的是一个词在文档中出现的频率。因为我们只考虑单个文档(即我们的示例句子),所以“蓝色”一词的词频最简单的计算方法是:用“蓝色”在文档中出现的次数除以文档中的总单词数。让我们来计算一下:

Term Frequency measures how frequently a term occurs in a document. Because we’re considering only a single document (our sample sentence), the simplest form of the TF for blue can be calculated as the number of times blue appears in the document divided by the total number of words in the document. Let’s calculate it:

文档中蓝色出现的次数:1

Number of times blue appears in the document: 1

文档总字数:6

Total number of words in the document: 6

TF = 1 ÷ 6TF = 0.16

TF = 1 ÷ 6TF = .16

逆文档频率(IDF)

Inverse Document Frequency (IDF)

逆文档频率 (IDF)用于衡量某个词项在整个语料库中的重要性。它的计算方法是用文档总数除以包含该词项的文档数,然后取该商的对数:

Inverse Document Frequency measures how important a term is within the entire corpus. It’s calculated by dividing the total number of documents by the number of documents containing the term and then taking the logarithm of that quotient:

IDF = log(文档总数 ÷ 包含该词的文档数)

IDF = log(Total number of documents ÷ Number of documents containing the word)

在这个例子中,语料库是一个包含八份文档的小集合,蓝色出现在其中四份文档中。

In this example, the corpus is a small collection of eight documents, and blue appears in four of these documents.

IDF = log(8 ÷ 4)

IDF = log(8 ÷ 4)

TF-IDF 计算

TF–IDF calculation

最后,我们通过将 TF 得分与 IDF 得分相乘来计算示例句子中“蓝色”的TF-IDF 得分:

Finally, the TF–IDF score for blue in our sample sentence is calculated by multiplying the TF and the IDF scores:

TF–IDF = TF × IDF

TF–IDF = TF × IDF

让我们使用提供的示例来计算单词“blue”的 TF-IDF 实际值;首先,词频(单词在文档中出现的频率)的计算方法如下:

Let’s compute the actual values for TF–IDF for the word blue using the example provided; first, the term frequency (how often the word occurs in the document) is computed as follows:

TF = 1 ÷ 6

TF = 1 ÷ 6

假设对数的底数为 10(常用),则逆文档频率的计算方法如下:

Assuming the base of the logarithm is 10 (commonly used), the inverse document frequency is computed as follows:

IDF = log10 (8 ÷ 4)

IDF = log10 (8 ÷ 4)

现在让我们计算一下句子“天空是蓝色的,很美丽”中单词“ blue ”的精确 TF-IDF 值:

Now let’s calculate the exact TF–IDF value for the word blue in the sentence, “The sky is blue and beautiful”:

词频(TF)约为 0.1670。

The Term Frequency (TF) is approximately 0.1670.

逆文档频率(IDF)约为 0.301。

The Inverse Document Frequency (IDF) is approximately 0.301.

因此,蓝色的 TF-IDF (TF × IDF) 得分约为 0.050。

Thus, the TF–IDF (TF × IDF) score for blue is approximately 0.050.

该 TF-IDF 值表示单词“蓝色”在给定文档(示例句子)中,在指定语料库(共八篇文档,其中四篇包含“蓝色” )上下文中的相对重要性。TF-IDF 值越高,表示重要性越大。

This TF–IDF score indicates the relative importance of the word blue in the given document (the sample sentence) within the context of the specified corpus (eight documents, with blue appearing in four of them). Higher TF–IDF scores imply greater importance.

我们在这里使用 TF-IDF 是因为它简单易懂,易于应用。现在我们已经将元素表示为向量,可以使用余弦相似度来衡量文档相似度。余弦相似度是一种用于计算多维空间中两个非零向量之间夹角的余弦值的度量方法,它表示这两个向量的相似程度,而与向量的大小无关。

We use TF–IDF here because it’s simple to apply and understand. Now that we have the elements represented as vectors, we can measure document similarity using cosine similarity. Cosine similarity is a measure used to calculate the cosine of the angle between two nonzero vectors in a multidimensional space, indicating how similar they are, irrespective of their size.

图 8.4 展示了余弦距离如何比较两段文本或两篇文档的向量表示。余弦相似度返回一个介于 -1(完全不相似)到 1(完全相同)之间的值。余弦距离是一个归一化值,范围从 0 到 2,通过 1 减去余弦相似度得到。余弦距离为 0 表示完全相同,而 2 表示完全相反。

Figure 8.4 shows how cosine distance compares the vector representations of two pieces or documents of text. Cosine similarity returns a value from –1 (not similar) to 1 (identical). Cosine distance is a normalized value ranging from 0 to 2, derived by taking 1 minus the cosine similarity. A cosine distance of 0 means identical items, and 2 indicates complete opposites.

数字
图 8.4余弦相似度的测量方法

清单 8.2 展示了如何使用 scikit-learn 中的函数计算余弦相似度cosine_similarity。该函数计算每个文档与集合中所有其他文档的相似度。计算出的文档相似度矩阵存储在一个cosine_similarities变量中。然后,在输入循环中,用户可以选择文档来查看它与其他文档的相似度。

Listing 8.2 shows how the cosine similarities are computed using the cosine_similarity function from scikit-learn. Similarities are calculated for each document against all other documents in the set. The computed matrix of similarities for documents is stored in the cosine_similarities variable. Then, in the input loop, the user can select the document to view its similarities to the other documents.

列表 8.2 document_vector_similarity(余弦相似度)
cosine_similarities = cosine_similarity(X)     #1

当 True 时:     #2
    selected_document_index = input(f"请输入文档编号"
 (0-{len(documents)-1}) 或 'exit' 退出: ").strip()

    如果 selected_document_index.lower() == 'exit':
        休息

    如果 selected_document_index.isdigit() 为假
 not 0 <= int(selected_document_index) < len(documents):
        print("输入无效。请输入有效的证件号码。")
        继续

    selected_document_index = int(selected_document_index)   #3

    selected_document_similarities = cosine_similarities[selected_document_index]    #4

# 省略了绘制文档相似度的代码
cosine_similarities = cosine_similarity(X)      #1

while True:      #2
    selected_document_index = input(f"Enter a document number
 (0-{len(documents)-1}) or 'exit' to quit: ").strip()

    if selected_document_index.lower() == 'exit':
        break

    if not selected_document_index.isdigit() or 
 not 0 <= int(selected_document_index) < len(documents):
        print("Invalid input. Please enter a valid document number.")
        continue

    selected_document_index = int(selected_document_index)    #3

    selected_document_similarities = cosine_similarities[selected_document_index]     #4

# code to plot document similarities omitted
#1 计算所有向量对的文档相似度

#2 主输入循环

#3 获取要比较的选定文档索引

#4 提取计算出的与所有文档的相似度

图 8.5 显示了在 VS Code 中运行示例的输出结果(按 F5 进入调试模式)。选择文档后,您将看到该集合中各个文档之间的相似度。文档与其自身的余弦相似度为 1。请注意,由于采用了 TF-IDF 向量化,您不会看到负相似度。稍后我们将探讨其他更复杂的语义相似度测量方法。

Figure 8.5 shows the output of running the sample in VS Code (F5 for debugging mode). After you select a document, you’ll see the similarities between the various documents in the set. A document will have a cosine similarity of 1 with itself. Note that you won’t see a negative similarity because of the TF–IDF vectorization. We’ll look later at other, more sophisticated means of measuring semantic similarity.

数字
图 8.5所选文档与文档集之间的余弦相似度

向量化方法将决定文档间语义相似度的度量。在探讨更优的文档向量化方法之前,我们将先研究如何存储向量以进行向量相似度搜索。

The method of vectorization will dictate the measure of semantic similarity between documents. Before we move on to better methods of vectorizing documents, we’ll examine storing vectors to perform vector similarity searches.

8.3.2向量数据库和相似性搜索

8.3.2 Vector databases and similarity search

文档向量化后,可以将其存储在向量数据库中,以便后续进行相似性搜索。为了演示其工作原理,我们可以使用 Python 代码高效地复制一个简单的向量数据库。

After vectorizing documents, they can be stored in a vector database for later similarity searches. To demonstrate how this works, we can efficiently replicate a simple vector database in Python code.

如清单 8.3 所示,在 VS Code 中打开此document_vector_database.py代码。该代码演示了如何在内存中创建一个向量数据库,然后允许用户输入文本来搜索该数据库并返回结果。返回的结果会显示文档文本和相似度得分。

Open document_vector_database.py in VS Code, as shown in listing 8.3. This code demonstrates creating a vector database in memory and then allowing users to enter text to search the database and return results. The results returned show the document text and the similarity score.

清单 8.3 document_vector_database.py
以上代码已省略
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
vector_database = X.toarray()    #1

def cosine_similarity_search(query,
                             数据库,
                             矢量化器,
                             top_n=5):    #2
    query_vec = vectorizer.transform([query]).toarray()
    相似度 = cosine_similarity(query_vec, database)[0]
    top_indices = np.argsort(-similarities)[:top_n] # 前 n 个索引
    返回 [(idx, similarities[idx]) for idx in top_indices]

当 True 时:     #3
    query = input("请输入搜索查询(或输入“exit”停止):")
    如果 query.lower() == 'exit':
        休息
    top_n = int(input("您想查看多少个最佳匹配结果?"))
    search_results = cosine_similarity_search(query,
                                              向量数据库,
                                              矢量化器,
                                              top_n)

    print("匹配度最高的文档:")
    对于 idx,搜索结果中的评分:
        print(f"- {documents[idx]} (Score: {score:.4f})")  #4

    print("\n")
###输出
请输入搜索词(或输入“退出”停止):蓝色
你想看多少场顶级比赛?3场
最匹配的文档:
天空湛蓝美丽。(得分:0.4080)
- 好喜欢这片湛蓝美丽的天空!(得分:0.3439)
棕色的狐狸动作敏捷,蓝色的狗很懒惰!(得分:0.2560)
# code above omitted
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
vector_database = X.toarray()     #1

def cosine_similarity_search(query,
                             database, 
                             vectorizer, 
                             top_n=5):     #2
    query_vec = vectorizer.transform([query]).toarray()
    similarities = cosine_similarity(query_vec, database)[0]
    top_indices = np.argsort(-similarities)[:top_n]  # Top n indices
    return [(idx, similarities[idx]) for idx in top_indices]

while True:      #3
    query = input("Enter a search query (or 'exit' to stop): ")
    if query.lower() == 'exit':
        break
    top_n = int(input("How many top matches do you want to see? "))
    search_results = cosine_similarity_search(query,
                                              vector_database, 
                                              vectorizer, 
                                              top_n)

    print("Top Matched Documents:")
    for idx, score in search_results:
        print(f"- {documents[idx]} (Score: {score:.4f})")   #4

    print("\n")
###Output
Enter a search query (or 'exit' to stop): blue
How many top matches do you want to see? 3
Top Matched Documents:
- The sky is blue and beautiful. (Score: 0.4080)
- Love this blue and beautiful sky! (Score: 0.3439)
- The brown fox is quick and the blue dog is lazy! (Score: 0.2560)
#1 将文档向量存储到数组中

#2 对查询结果进行相似度匹配的函数,返回匹配项和相似度分数

#3 主输入循环

#4 循环遍历结果并输出文本和相似度分数

运行此练习以查看输出(在 VS Code 中按 F5)。输入任意文本,即可查看返回的文档结果。此搜索表单适用于将单词和短语与相似的单词和短语进行匹配。但这种搜索方式会忽略文档中的词语上下文和含义。下一节中,我们将探讨一种将文档转换为向量的方法,以便更好地保留其语义。

Run this exercise to see the output (F5 in VS Code). Enter any text you like, and see the results of documents being returned. This search form works well for matching words and phrases with similar words and phrases. This form of search misses the word context and meaning from the document. In the next section, we’ll look at a way of transforming documents into vectors that better preserves their semantic meaning.

8.3.3揭秘文档嵌入

8.3.3 Demystifying document embeddings

TF-IDF 是一种简单的词频统计方法,旨在捕捉文档中的语义信息。然而,它并不可靠,因为它只统计词频,而忽略了词语之间的关系。一种更好、更现代的方法是使用文档嵌入,这是一种文档向量化方法,能够更好地保留文档的语义信息。

TF–IDF is a simple form that tries to capture semantic meaning in documents. However, it’s unreliable because it only counts word frequency and doesn’t understand the relationships between words. A better and more modern method uses document embedding, a form of document vectorizing that better preserves the semantic meaning of the document.

嵌入网络是通过在大型数据集上训练神经网络来构建的,它将单词、句子或文档映射到高维向量,从而根据上下文和数据中的关系捕捉语义和句法关系。通常,你会使用在海量数据集上预训练的模型来嵌入文档并执行嵌入操作。许多来源都提供了这样的模型,包括 Hugging Face,当然还有 OpenAI。

Embedding networks are constructed by training neural networks on large datasets to map words, sentences, or documents to high-dimensional vectors, capturing semantic and syntactic relationships based on context and relationships in the data. You typically use a pretrained model trained on massive datasets to embed documents and perform embeddings. Models are available from many sources, including Hugging Face and, of course, OpenAI.

在接下来的场景中,我们将使用 OpenAI 的嵌入模型。这些模型通常非常适合捕捉嵌入文档的语义上下文。清单 8.4 展示了使用 OpenAI 将文档嵌入到向量中的相关代码,然后将这些向量降维到三维空间并渲染成图表。

In our next scenario, we’ll use an OpenAI embedding model. These models are typically perfect for capturing the semantic context of embedded documents. Listing 8.4 shows the relevant code that uses OpenAI to embed the documents into vectors that are then reduced to three dimensions and rendered into a plot.

清单 8.4 document_visualizing_embeddings.py(相关章节)
load_dotenv()     #1
api_key = os.getenv('OPENAI_API_KEY')
如果没有 api_key:
    raise ValueError("未找到 API 密钥。请检查您的 .env 文件。")
client = OpenAI(api_key=api_key) #1            

def get_embedding(text, model="text-embedding-ada-002"):    #2
    text = text.replace("\n", " ")
    返回 client.embeddings.create(input=[text],
              model=model).data[0].embedding #2

# 示例文件(已省略)

embeddings = [get_embedding(doc) for doc in documents]   #3
print(embeddings_array.shape)

embeddings_array = np.array(embeddings)   #4

pca = PCA(n_components=3)  #5
reduced_embeddings = pca.fit_transform(embeddings_array)
load_dotenv()      #1
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
    raise ValueError("No API key found. Please check your .env file.")
client = OpenAI(api_key=api_key)     #1            

def get_embedding(text, model="text-embedding-ada-002"):     #2
    text = text.replace("\n", " ")
    return client.embeddings.create(input=[text],
              model=model).data[0].embedding                #2

# Sample documents (omitted)

embeddings = [get_embedding(doc) for doc in documents]    #3
print(embeddings_array.shape)

embeddings_array = np.array(embeddings)    #4

pca = PCA(n_components=3)   #5
reduced_embeddings = pca.fit_transform(embeddings_array)
#1 将字符串“,”中的所有元素连接起来。

#2 使用 OpenAI 客户端创建嵌入向量。

#3 为每个文档生成 1536 维的嵌入向量。

#4 将嵌入向量转换为 NumPy 数组以进行 PCA 分析。

#5 应用 PCA 将维度降低到 3 以便绘制图形。

当使用 OpenAI 模型嵌入文档时,它会将文本转换为 1536 维的向量。我们无法可视化如此多的维度,因此我们使用通过主成分分析 (PCA) 的降维技术将 1536 维的向量转换为 3 维。

When a document is embedded using an OpenAI model, it transforms the text into a vector with dimensions of 1536. We can’t visualize this number of dimensions, so we use a dimensionality reduction technique via principal component analysis (PCA) to convert the vector of size 1536 to 3 dimensions.

图 8.6 显示了在 VS Code 中运行该文件生成的输出结果。通过将嵌入降维到 3D,我们可以绘制输出图,以展示语义相似的文档是如何分组的。

Figure 8.6 shows the output generated from running the file in VS Code. By reducing the embeddings to 3D, we can plot the output to show how semantically similar documents are now grouped.

数字
图 8.6三维嵌入图,展示了语义相似的文档是如何分组的。

选择使用哪种嵌入模型或服务完全取决于您。OpenAI 的嵌入模型被认为是通用语义相似性分析的最佳选择。这使得这些模型成为大多数记忆和检索应用的标准。在理解了如何使用嵌入将文本向量化并存储在向量数据库中之后,我们可以在下一节中讨论一个更实际的例子。

The choice of which embedding model or service you use is up to you. The OpenAI embedding models are considered the best for general semantic similarity. This has made these models the standard for most memory and retrieval applications. With our understanding of how text can be vectorized with embeddings and stored in a vector database, we can move on to a more realistic example in the next section.

8.3.4从 Chroma 查询文档嵌入

8.3.4 Querying document embeddings from Chroma

我们可以把所有元素结合起来,用一个名为 Chroma DB 的本地矢量数据库来看一个完整的示例。矢量数据库有很多选择,但 Chroma DB 对于开发或小型项目来说是一个优秀的本地矢量存储方案。当然,还有许多功能更强大的选项,您可以稍后再考虑。

We can combine all the pieces and look at a complete example using a local vector database called Chroma DB. Many vector database options exist, but Chroma DB is an excellent local vector store for development or small-scale projects. There are also plenty of more robust options that you can consider later.

清单 8.5 显示了文件中新增的相关代码段document_query_ chromadb.py。请注意,结果按距离而非相似度评分。余弦距离由以下公式计算:

Listing 8.5 shows the new and relevant code sections from the document_query_ chromadb.py file. Note that the results are scored by distance and not by similarity. Cosine distance is determined by this equation:

余弦距离(A,B) = 1 – 余弦相似度(A,B)

Cosine Distance(A,B) = 1 – Cosine Similarity(A,B)

这意味着余弦距离的范围从 0(最相似)到 2(语义完全相反)。

This means that cosine distance will range from 0 for most similar to 2 for semantically opposite in meaning.

清单 8.5 document_query_chromadb.py(相关代码章节)
embeddings = [get_embedding(doc) for doc in documents]    #1
ids = [f"id{i}" for i in range(len(documents))] #1

chroma_client = chromadb.Client()              #2
collection = chroma_client.create_collection(
                       名称="文档") #2
collection.add(    #3
    嵌入=嵌入,
    文档=文档,
    ids=ids
)

def query_chromadb(query, top_n=2):     #4
    查询嵌入 = 获取嵌入(查询)
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_n
    )
    返回 [(id, score, text) for id, score, text in
            zip(results['ids'][0],
                results['距离'][0],
                results['documents'][0])]

当 True 时:    #5
    query = input("请输入搜索查询(或输入“exit”停止):")
    如果 query.lower() == 'exit':
        休息
    top_n = int(input("您想查看多少个最佳匹配结果?"))
    search_results = query_chromadb(query, top_n)

    print("匹配度最高的文档:")
    对于搜索结果中的每个 id、score 和 text:
        print(f"""
ID:{id} 文本:{text} 分数:{round(score, 2)}
""") #5

    print("\n")
###输出
输入搜索词(或输入“退出”停止):狗很懒惰
你想看多少场顶级比赛?3场
最匹配的文档:
ID:id7 文本:狗很懒,但棕狐很敏捷!得分:0.24
ID:id5 文本:棕色狐狸动作敏捷,蓝色小狗懒惰!得分:0.28
ID:id2 文本:敏捷的棕色狐狸跳过了懒惰的狗。得分:0.29
embeddings = [get_embedding(doc) for doc in documents]     #1
ids = [f"id{i}" for i in range(len(documents))]           #1

chroma_client = chromadb.Client()               #2
collection = chroma_client.create_collection(
                       name="documents")       #2
collection.add(     #3
    embeddings=embeddings,
    documents=documents,
    ids=ids
)

def query_chromadb(query, top_n=2):      #4
    query_embedding = get_embedding(query)
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_n
    )
    return [(id, score, text) for id, score, text in
            zip(results['ids'][0],
                results['distances'][0], 
                results['documents'][0])]

while True:     #5
    query = input("Enter a search query (or 'exit' to stop): ")
    if query.lower() == 'exit':
        break
    top_n = int(input("How many top matches do you want to see? "))
    search_results = query_chromadb(query, top_n)

    print("Top Matched Documents:")
    for id, score, text in search_results:
        print(f"""
ID:{id} TEXT: {text} SCORE: {round(score, 2)}
""")    #5

    print("\n")
###Output
Enter a search query (or 'exit' to stop): dogs are lazy
How many top matches do you want to see? 3
Top Matched Documents:
ID:id7 TEXT: The dog is lazy but the brown fox is quick! SCORE: 0.24
ID:id5 TEXT: The brown fox is quick and the blue dog is lazy! SCORE: 0.28
ID:id2 TEXT: The quick brown fox jumps over the lazy dog. SCORE: 0.29
#1 为每个文档生成嵌入向量并分配一个 ID

#2 创建 Chroma DB 客户端和一个集合

#3 将文档嵌入向量添加到集合中

#4 查询数据存储并返回前 n 个相关文档

#5 用户输入和输出相关文档/分数的输入循环

正如前面的场景所示,现在您可以使用语义信息而非仅仅使用关键词或短语来查询文档。这些场景应该能够帮助您理解检索模式的底层工作原理。在下一节中,我们将了解如何使用 LangChain 来实现这种检索模式。

As the earlier scenario demonstrated, you can now query the documents using semantic meaning rather than just key terms or phrases. These scenarios should now provide the background to see how the retrieval pattern works at a low level. In the next section, we’ll see how the retrieval pattern can be employed using LangChain.

8.4使用 LangChain 构建 RAG

8.4 Constructing RAG with LangChain

LangChain 最初是一个开源项目,专门用于抽象跨多个数据源和向量存储的检索模式。此后,它已经发展成为功能更加强大的工具,但其核心功能仍然为实现检索提供了优秀的方案。

LangChain began as an open source project specializing in abstracting the retrieval pattern across multiple data sources and vector stores. It has since morphed into much more, but foundationally, it still provides excellent options for implementing retrieval.

图 8.7 展示了 LangChain 的一个流程图,该流程图描述了文档存储以供检索的过程。这些步骤可以全部或部分地复制到内存检索中。文档检索和内存检索的关键区别在于数据源以及内容的转换方式。

Figure 8.7 shows a diagram from LangChain that identifies the process of storing documents for retrieval. These same steps may be replicated in whole or in part to implement memory retrieval. The critical difference between document and memory retrieval is the source and how content is transformed.

数字
图 8.7文档存储以便后续检索的步骤:加载、转换、嵌入和存储

我们将探讨如何使用 LangChain 实现这些步骤,并了解实现过程中涉及的细微差别和细节。下一节,我们将首先使用 LangChain 来拆分和加载文档。

We’ll examine how to implement each of these steps using LangChain and understand the nuances and details accompanying this implementation. In the next section, we’ll start by splitting and loading documents with LangChain.

8.4.1使用 LangChain 拆分和加载文档

8.4.1 Splitting and loading documents with LangChain

检索机制会为给定的提示添加与请求相关的特定信息。例如,您可以请求有关本地文档的详细信息。在早期的语言模型中,由于令牌限制,无法将整个文档作为提示的一部分提交。

Retrieval mechanisms augment the context of a given prompt with specific information relevant to the request. For example, you may request detailed information about a local document. With earlier language models, submitting the whole document as part of the prompt wasn’t an option due to token limitations.

如今,我们可以将整篇文档作为快速请求的一部分提交给许多商业语言学习模型(例如 GPT-4 Turbo)。然而,由于词元数量的增加,结果可能并不理想,而且成本也可能更高。因此,更好的选择是将文档拆分,并使用相关部分来请求上下文——这正是 RAG 和内存模型所做的。

Today, we could submit a whole document for many commercial LLMs, such as GPT-4 Turbo, as part of a prompt request. However, the results may not be better and would likely cost more because of the increased number of tokens. Therefore, a better option is to split the document and use the relevant parts to request context—precisely what RAG and memory do.

将文档拆分为语义上和具体内容相关的部分至关重要。图 8.8 展示了如何拆分包含鹅妈妈童谣的 HTML 文档。通常,将文档拆分为上下文相关的语义块需要仔细考虑。

Splitting a document is essential in breaking down content into semantically and specifically relevant sections. Figure 8.8 shows how to break down an HTML document containing the Mother Goose nursery rhymes. Often, splitting a document into contextual semantic chunks requires careful consideration.

数字
图 8.8文档理想情况下应如何拆分成多个部分,以便更好地表达语义和上下文含义

理想情况下,我们将文档分割成块时,应根据相关性和语义进行划分。虽然语言学习模型(LLM)或代理可以帮助我们实现这一点,但我们将重点介绍 LangChain 中现有的文档分割工具包选项。本章稍后将介绍一种语义函数,它可以帮助我们对内容进行语义分割,以便进行词嵌入。

Ideally, when we split documents into chunks, they are broken down by relevance and semantic meaning. While an LLM or agent could help us with this, we’ll look at current toolkit options within LangChain for splitting documents. Later in this chapter, we’ll look at a semantic function that can assist us in semantically dividing content for embeddings.

接下来,请langchain_load_splitting.py在 VS Code 中打开代码,如清单 8.6 所示。这段代码接续上一节清单 8.5 的内容。这次,我们加载的是鹅妈妈童谣,而不是示例文档。

For the next exercise, open langchain_load_splitting.py in VS Code, as shown in listing 8.6. This code shows where we left off from listing 8.5, in the previous section. Instead of using the sample documents, we’re loading the Mother Goose nursery rhymes this time.

清单 8.6 langchain_load_splitting.py(章节和输出)
从 langchain_community.document_loaders 
                      导入 UnstructuredHTMLLoader   #1
from langchain.text_splitter import RecursiveCharacterTextSplitter
#之前的代码

加载器 = UnstructuredHTMLLoader(
                   "sample_documents/mother_goose.xhtml")  #2
数据 = loader.load   #3

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=25,    #4
    长度函数=len,
    add_start_index=True,
)
documents = text_splitter.split_documents(data)

文档 = [doc.page_content
                 for doc in documents] [100:350]  #5

embeddings = [get_embedding(doc) for doc in documents]    #6
ids = [f"id{i}" for i in range(len(documents))]
###输出
输入搜索查询(或输入“退出”停止):谁亲吻了女孩们,让
她们哭了?
你想看多少场顶级比赛?3场
最匹配的文档:
ID:id233 文本:并责备了她的女儿,
        他吻了我妹妹而不是我。得分:0.4…
From langchain_community.document_loaders 
                      import UnstructuredHTMLLoader    #1
from langchain.text_splitter import RecursiveCharacterTextSplitter
#previous code

loader = UnstructuredHTMLLoader(
                   "sample_documents/mother_goose.xhtml")   #2
data = loader.load    #3

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=25,     #4
    length_function=len,
    add_start_index=True,
)
documents = text_splitter.split_documents(data)

documents = [doc.page_content 
                 for doc in documents] [100:350]   #5

embeddings = [get_embedding(doc) for doc in documents]     #6
ids = [f"id{i}" for i in range(len(documents))]
###Output
Enter a search query (or 'exit' to stop): who kissed the girls and made 
them cry?
How many top matches do you want to see? 3
Top Matched Documents:
ID:id233 TEXT: And chid her daughter,
        And kissed my sister instead of me. SCORE: 0.4…
#1 导入新的 LangChain 模块

#2 将文档加载为 HTML 格式

#3 加载文档

#4 将文档分割成长度为 100 个字符、重叠 25 个字符的文本块

#5 仅嵌入 250 个文本块,这样更高效

#6 返回每个文档的嵌入向量

请注意清单 8.6 中的 HTML 文档,它被分割成 100 个字符的片段,片段之间有 25 个字符的重叠。这种重叠设计可以避免文档的各个部分截断特定内容。我们选择这种分割器是因为它易于使用、设置和理解。

Note in listing 8.6 that the HTML document gets split into 100-character chunks with a 25-character overlap. The overlap allows the document’s parts not to cut off specific thoughts. We selected the splitter for this exercise because it was easy to use, set up, and understand.

langchain_load_splitting.py在 VS Code 中运行该文件(按 F5)。输入查询语句,查看结果。清单 8.6 中的输出结果针对特定示例给出了良好的结果。请记住,为了降低成本并缩短练习时间,我们仅嵌入了 250 个文档片段。当然,您也可以尝试嵌入整个文档或使用较小的输入文档示例。

Go ahead and run the langchain_load_splitting.py file in VS Code (F5). Enter a query, and see what results you get. The output in listing 8.6 shows good results given a specific example. Remember that we only embedded 250 document chunks to reduce costs and keep the exercise short. Of course, you can always try to embed the entire document or use a minor input document example.

构建完善的检索系统,文档分割过程或许是最关键的环节。你可以使用多种方法来分割文档,包括同时使用多个方法。多个方法可以分别处理并分割文档,生成同一文档的多个嵌入视图。下一节,我们将探讨一种更通用的文档分割技术——使用词元和词元化。

Perhaps the most critical element to building proper retrieval is the process of document splitting. You can use numerous methods to split a document, including multiple concurrent methods. More than one method passes and splits the document for numerous embedding views of the same document. In the next section, we’ll examine a more general technique for splitting documents, using tokens and tokenization.

8.4.2使用 LangChain 按 token 拆分文档

8.4.2 Splitting documents by token with LangChain

分词是将文本分解成词元的过程。词元代表文本中的一个简洁元素,它可以是像“hold”这样的单词,甚至是像左花括号({)这样的符号,具体取决于上下文。

Tokenization is the process of breaking text into word tokens. Where a word token represents a succinct element in the text, a token could be a word like hold or even a symbol like the left curly brace ({), depending on what’s relevant.

使用分词法对文档进行分割,可以为语言模型如何解释文本以及语义相似性分析提供更好的基础。分词法还可以去除无关字符(例如空格),使文档相似性匹配更加精准,从而通常能提供更好的结果。

Splitting documents using tokenization provides a better base for how the text will be interpreted by language models and for semantic similarity. Tokenization also allows the removal of irrelevant characters, such as whitespace, making the similarity matching of documents more relevant and generally providing better results.

接下来,请langchain_token_splitting.py在 VS Code 中打开文件,如清单 8.7 所示。现在,我们使用词法分析来分割文档,将文档分成大小不等的部分。大小不等的原因是原始文档中存在大量空白区域。

For the next code exercise, open the langchain_token_splitting.py file in VS Code, as shown in listing 8.7. Now we split the document using tokenization, which breaks the document into sections of unequal size. The unequal size results from the large sections of whitespace of the original document.

清单 8.7 langchain_token_splitting.py(相关新代码)
loader = UnstructuredHTMLLoader("sample_documents/mother_goose.xhtml")
数据 = loader.load()
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    数据块大小=50,数据块重叠=10     #1
)

documents = text_splitter.split_documents(data)
documents = [doc for doc in documents][8:94]     #2

db = Chroma.from_documents(documents, OpenAIEmbeddings())

def query_documents(query, top_n=2):
    docs = db.similarity_search(query, top_n)     #3
    返回文档
###输出
创建了一个大小为 68 的数据块,
比规定的50更长
创建了一个大小为 67 的数据块,
比规定的50更长    #4
请输入搜索词(或输入“退出”停止):
                     谁亲吻了女孩们,让她们哭了?
你想看多少场顶级比赛?3场
最匹配的文档:
文件1:乔治·波吉

        乔治·波吉、布丁和馅饼
        亲吻女孩们,把她们弄哭了。
loader = UnstructuredHTMLLoader("sample_documents/mother_goose.xhtml")
data = loader.load()
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=50, chunk_overlap=10      #1
)

documents = text_splitter.split_documents(data)
documents = [doc for doc in documents][8:94]      #2

db = Chroma.from_documents(documents, OpenAIEmbeddings())

def query_documents(query, top_n=2):
    docs = db.similarity_search(query, top_n)      #3
    return docs
###Output
Created a chunk of size 68, 
which is longer than the specified 50
Created a chunk of size 67, 
which is longer than the specified 50     #4
Enter a search query (or 'exit' to stop): 
                     who kissed the girls and made them cry?
How many top matches do you want to see? 3
Top Matched Documents:
Document 1: GEORGY PORGY

        Georgy Porgy, pudding and pie,
        Kissed the girls and made them cry.
#1 更新至 50 个词元,重叠词元为 10 个

#2 仅选择包含押韵词的文档

#3 使用数据库的相似度搜索

#4 由于存在空格,将文档分割成不规则大小的块

在 VS Code 中运行langchain_token_splitting.py代码(按 F5)。你可以使用上次的查询语句,也可以使用你自己的查询语句。注意结果比之前的练习明显好得多。但是,由于查询语句使用了几个顺序相同的相似词,结果仍然值得怀疑。

Run the langchain_token_splitting.py code in VS Code (F5). You can use the query we used last time or your own. Notice how the results are significantly better than the previous exercise. However, the results are still suspect because the query uses several similar words in the same order.

更好的测试方法是尝试使用语义相似但用词不同的短语,并检查结果。在代码仍在运行的情况下,输入一个新的查询短语:Why are the girls crying?清单 8.8 显示了执行该查询的结果。如果您自己运行此示例并向下滚动输出,您会看到“Georgy Porgy”出现在返回的第二个或第三个文档中。

A better test would be to try a semantically similar phrase but one that uses different words and check the results. With the code still running, enter a new phrase to query: Why are the girls crying? Listing 8.8 shows the results of executing that query. If you run this example yourself and scroll down over the output, you’ll see Georgy Porgy appear in either the second or third returned document.

列表 8.8查询:谁让女孩们哭了?
请输入搜索查询(或输入“退出”停止):谁让女孩们哭了?
你想看多少场顶级比赛?3场
最匹配的文档:
文件1:威利,威利

        威利,威利·威尔金……
Enter a search query (or 'exit' to stop): Who made the girls cry?
How many top matches do you want to see? 3
Top Matched Documents:
Document 1: WILLY, WILLY

        Willy, Willy Wilkin…

本练习展示了如何运用各种检索方法按语义返回文档。在此基础上,我们可以了解如何将 RAG 应用于知识和记忆系统。下一节将讨论 RAG 在智能体知识和智能体系统中的应用。

This exercise shows how various retrieval methods can be employed to return documents semantically. With this base established, we can see how RAG can be applied to knowledge and memory systems. The following section will discuss RAG as it applies to knowledge of agents and agentic systems.

8.5将 RAG 应用于建筑代理人知识

8.5 Applying RAG to building agent knowledge

智能体的知识涵盖了利用红绿灯算法(RAG)对非结构化文档进行语义搜索。这些文档可以是任何类型的文档,从PDF到Microsoft Word文档,以及所有文本,包括代码。智能体的知识还包括利用非结构化文档进行问答、参考查找、信息增强以及其他未来模式的构建。

Knowledge in agents encompasses employing RAG to search semantically across unstructured documents. These documents could be anything from PDFs to Microsoft Word documents and all text, including code. Agentic knowledge also includes using unstructured documents for Q&A, reference lookup, information augmentation, and other future patterns.

Nexus是与本书同步开发的智能体平台,已在前一章中介绍。它为智能体配备了完整的知识和记忆系统。本节将揭示该知识系统的工作原理。

Nexus, the agent platform developed in tandem with this book and introduced in the previous chapter, employs complete knowledge and memory systems for agents. In this section, we’ll uncover how the knowledge system works.

要仅为本章安装 Nexus,请参阅清单 8.9。在chapter_08文件夹内打开终端,并执行清单中的命令,以下载、安装并以普通模式或开发模式运行 Nexus。如果您想参考代码,则应在开发模式下安装项目,并配置调试器以从 VS Code 运行 Streamlit 应用程序。如果您需要复习任何步骤,请参阅第 7 章。

To install Nexus for just this chapter, see listing 8.9. Open a terminal within the chapter_08 folder, and execute the commands in the listing to download, install, and run Nexus in normal or development mode. If you want to refer to the code, you should install the project in development and configure the debugger to run the Streamlit app from VS Code. Refer to chapter 7 if you need a refresher on any of these steps.

列表 8.9安装 Nexus
# 安装并运行
pip install git+https://github.com/cxbxmxcx/Nexus.git

枢纽跑
# 以开发模式安装
git clone https://github.com/cxbxmxcx/Nexus.git

# 以可编辑模式安装克隆的存储库
pip install -e Nexus
# to install and run
pip install git+https://github.com/cxbxmxcx/Nexus.git

nexus run
# install in development mode
git clone https://github.com/cxbxmxcx/Nexus.git

# Install the cloned repository in editable mode
pip install -e Nexus

无论你登录后决定使用哪种方法运行应用程序,请导航至知识库管理器页面,如图 8.9 所示。创建一个新的知识库,然后上传sample_documents/back_to_the_future.txt电影剧本。

Regardless of which method you decide to run the app in after you log in, navigate to the Knowledge Store Manager page, as shown in figure 8.9. Create a new Knowledge Store, and then upload the sample_documents/back_to_the_future.txt movie script.

数字
图 8.9添加新的知识库并向其中填充文档

脚本文件较大,加载、分块和将各个部分嵌入到 Chroma DB 向量数据库中可能需要一些时间。请等待索引完成,然后您可以检查嵌入结果并运行查询,如图 8.10 所示。

The script is a large document, and it may take a while to load, chunk, and embed the parts into the Chroma DB vector database. Wait for the indexing to complete, and then you can inspect the embeddings and run a query, as shown in figure 8.10.

数字
图 8.10嵌入和文档查询视图

现在,我们可以将知识库连接到支持的代理并提出问题。使用左上角的选择器在 Nexus 界面中选择聊天页面。然后,选择代理和知识库time_travel,如图 8.11 所示。您还需要选择一个支持知识库的代理引擎。每个代理引擎都需要进行正确的配置才能访问。

Now, we can connect the knowledge store to a supported agent and ask questions. Use the top-left selector to choose the chat page within the Nexus interface. Then, select an agent and the time_travel knowledge store, as shown in figure 8.11. You will also need to select an agent engine that supports knowledge. Each of the multiple agent engines requires the proper configuration to be accessible.

数字
图 8.11启用知识库供代理使用

目前,截至本章所述,Nexus 一次仅支持访问一个知识库。在未来的版本中,智能体或许能够同时选择多个知识库。这可能包括更高级的选项,例如语义知识以及采用其他形式的 RAG(红绿灯)算法。

Currently, as of this chapter, Nexus supports access to only a single knowledge store at a time. In a future version, agents may be able to select multiple knowledge stores at a time. This may include more advanced options, from semantic knowledge to employing other forms of RAG.

您还可以在知识库管理器页面的“配置”选项卡中配置 RAG 设置,如图 8.12 所示。目前,您可以从拆分器类型(“分块选项”字段)中选择文档分块方式,以及“块大小”字段和“重叠”字段。

You can also configure the RAG settings within the Configuration tab of the Knowledge Store Manager page, as shown in figure 8.12. As of now, you can select from the type of splitter (Chunking Option field) to chunk the document, along with the Chunk Size field and Overlap field.

数字
图 8.12管理知识库拆分和分块选项

目前,LangChain 仅支持加载、分割、分块和嵌入这几个基本选项。未来版本的 Nexus 将提供更多选项和模式。支持其他选项的代码可以直接添加到 Nexus 中。

The loading, splitting, chunking, and embedding options provided are the only basic options supported by LangChain for now. In future versions of Nexus, more options and patterns will be offered. The code to support other options can be added directly to Nexus.

我们不会详细介绍执行 RAG 算法的代码,因为它与我们之前讲解过的部分非常相似。您可以查看 Nexus 代码,特别是文件KnowledgeManager中的类knowledge_manager.py

We won’t cover the code that performs the RAG as it’s very similar to what we already covered. Feel free to review the Nexus code, particularly the KnowledgeManager class in the knowledge_manager.py file.

虽然知识和记忆的检索模式在增强方面非常相似,但两者在填充存储方面却有所不同。下一节,我们将探讨智能体记忆的独特之处。

While the retrieval patterns for knowledge and memory are quite similar for augmentation, the two patterns differ when it comes to populating the stores. In the next section, we’ll explore what makes memory in agents unique.

8.6在智能体系统中实现记忆

8.6 Implementing memory in agentic systems

在智能体和人工智能应用中,记忆通常与认知记忆功能使用相同的术语来描述。认知记忆指的是我们用来记住30秒前做了什么或30年前身高的那种记忆。计算机记忆也是智能体记忆的重要组成部分,但本节暂不讨论。

Memory in agents and AI applications is often described in the same terms as cognitive memory functions. Cognitive memory describes the type of memory we use to remember what we did 30 seconds ago or how tall we were 30 years ago. Computer memory is also an essential element of agent memory, but one we won’t consider in this section.

图 8.13 展示了记忆如何分解为感觉记忆、短期记忆和长期记忆。这种记忆方式可以应用于人工智能体,下表描述了每种记忆形式如何映射到智能体功能:

Figure 8.13 shows how memory is broken down into sensory, short-term, and long-term memory. This memory can be applied to AI agents, and this list describes how each form of memory maps to agent functions:

  • 人工智能中的感觉记忆 ——类似于红绿灯算法的功能,但使用图像/音频/触觉数据形式。它能短暂地保存输入数据(例如文本和图像)以供即时处理,但不会进行长期存储。
  • Sensory memory in AI —Functions such as RAG but with images/audio/haptic data forms. Briefly holds input data (e.g., text and images) for immediate processing but not long-term storage.
  • 人工智能中的短期/工作记忆 ——充当对话历史的活跃记忆缓冲区。我们保存少量近期输入和上下文信息,以便立即进行分析和生成响应。在 Nexus 中,短期和长期对话记忆也保存在对话线程的上下文中。
  • Short-term/working memory in AI —Acts as an active memory buffer of conversation history. We’re holding a limited amount of recent input and context for immediate analysis and response generation. Within Nexus, short- and long-term conversational memory is also held in the context of the thread.
  • 人工智能中的长期记忆 ——与智能体或用户生活相关的长期记忆存储。语义记忆提供了一种强大的能力,可以存储和检索相关的全局或局部事实和概念。
  • Long-term memory in AI —Longer-term memory storage relevant to the agent’s or user’s life. Semantic memory provides a robust capacity to store and retrieve relevant global or local facts and concepts.
数字
图 8.13内存如何分解成各种形式

虽然记忆的检索和增强机制与知识完全相同,但在更新或添加记忆时,两者通常存在显著差异。图 8.14 展示了捕获和使用记忆来增强提示信息的过程。由于记忆的大小通常与完整文档的大小不同,我们可以避免使用任何分割或分块机制。

While memory uses the exact same retrieval and augmentation mechanisms as knowledge, it typically differs significantly when updating or appending memories. Figure 8.14 highlights the process of capturing and using memories to augment prompts. Because memories are often different from the size of complete documents, we can avoid using any splitting or chunking mechanisms.

数字
图 8.14基本记忆检索和增强工作流程

Nexus 提供了一种类似知识库的机制,允许用户创建可配置用于各种用途和应用的内存存储。它还支持图 8.13 中突出显示的一些更高级的内存形式。下一节将探讨 Nexus 中基本内存存储的工作原理。

Nexus provides a mechanism like the knowledge store, allowing users to create memory stores that can be configured for various uses and applications. It also supports some of the more advanced memory forms highlighted in figure 8.13. The following section will examine how basic memory stores work in Nexus.

8.6.1在 Nexus 中消耗内存存储

8.6.1 Consuming memory stores in Nexus

在Nexus中,记忆存储的运作和构建方式与知识存储类似。它们都高度依赖于检索模式。不同之处在于,记忆系统在构建新记忆时需要额外的步骤。

Memory stores operate and are constructed like knowledge stores in Nexus. They both heavily rely on the retrieval pattern. What differs is the extra steps memory systems take to build new memories.

启动 Nexus,如果需要安装,请参考清单 8.9。登录后,选择“内存”页面,创建一个新的内存存储,如图 8.15 所示。选择一个代理引擎,然后添加一些个人信息和偏好设置。

Go ahead and start Nexus, and refer to listing 8.9 if you need to install it. After logging in, select the Memory page, and create a new memory store, as shown in figure 8.15. Select an agent engine, and then add a few personal facts and preferences about yourself.

数字
图 8.15向新建的存储器添加存储器

我们需要代理(LLM)的原因已在图 8.14 中展示。当信息被输入到存储器中时,通常会通过 LLM 使用记忆功能进行处理,其目的是将语句/对话处理成与存储器类型相关的语义相关信息。

The reason we need an agent (LLM) was shown in figure 8.14 earlier. When information is fed into a memory store, it’s generally processed through an LLM using a memory function, whose purpose is to process the statements/conversations into semantically relevant information related to the type of memory.

清单 8.10 展示了用于从对话中提取信息并存储到记忆中的对话记忆功能。是的,这只是发送给 LLM 的提示信息的头部部分,指示它如何从对话中提取信息。

Listing 8.10 shows the conversational memory function used to extract information from a conversation into memories. Yes, this is just the header portion of the prompt sent to the LLM, instructing it how to extract information from a conversation.

清单 8.10对话记忆功能
总结对话内容,并创建一组概括性的陈述。
对话内容。返回一个包含以下键的 JSON 对象:'summary'。
每个键都应该有一个与其相关的语句列表。
类别。仅返回 JSON 对象,不返回其他任何内容。
Summarize the conversation and create a set of statements that summarize 
the conversation. Return a JSON object with the following keys: 'summary'. 
Each key should have a list of statements that are relevant to that 
category. Return only the JSON object and nothing else.

在您生成一些与自身相关的记忆后,返回 Nexus 中的聊天区域,启用my_memory记忆存储,看看智能体对您的了解程度。图 8.16 显示了使用不同智能体引擎的示例对话。

After you generate a few relevant memories about yourself, return to the Chat area in Nexus, enable the my_memory memory store, and see how well the agent knows you. Figure 8.16 shows a sample conversation using a different agent engine.

数字
图 8.16与同一内存存储中的不同代理进行对话

这是一个基本的记忆模式示例,它从对话中提取事实/偏好,并将它们作为记忆存储在向量数据库中。图 8.13 中展示的记忆模式还有许多其他实现方式。我们将在下一节中实现它们。

This is an example of a basic memory pattern that extracts facts/preferences from conversations and stores them in a vector database as memories. Numerous other implementations of memory follow those displayed earlier in figure 8.13. We’ll implement those in the next section.

8.6.2语义记忆及其在语义记忆、情景记忆和程序记忆中的应用

8.6.2 Semantic memory and applications to semantic, episodic, and procedural memory

心理学家根据记忆内容的不同,将记忆分为多种形式。语义记忆、情景记忆和程序性记忆分别代表不同类型的信息。情景记忆记录的是事件,程序性记忆记录的是过程或步骤,而语义记忆则代表意义,可能包含感觉或情绪。其他形式的记忆(例如地理空间记忆)在此未作详细描述,但未来可能会涉及。

Psychologists categorize memory into multiple forms, depending on what information is remembered. Semantic, episodic, and procedural memory all represent different types of information. Episodic memories are about events, procedural memories are about the process or steps, and semantic represents the meaning and could include feelings or emotions. Other forms of memory (geospatial is another), aren’t described here but could be.

由于这些记忆依赖于额外的分类层级,它们也依赖于另一层语义分类。一些平台,例如语义内核(SK),将其称为语义记忆。这可能会造成混淆,因为语义分类也被用于提取情景记忆和程序性记忆。

Because these memories rely on an additional level of categorization, they also rely on another level of semantic categorization. Some platforms, such as Semantic Kernel (SK), refer to this as semantic memory. This can be confusing because semantic categorization is also applied to extract episodic and procedural memories.

图 8.17 展示了语义记忆分类过程,有时也称为语义记忆。语义记忆与常规记忆的区别在于,语义记忆增加了一个步骤,即对输入进行语义处理,并提取可用于查询记忆相关向量数据库的相关问题。

Figure 8.17 shows the semantic memory categorization process, also sometimes called semantic memory. The difference between semantic memory and regular memory is the additional step of processing the input semantically and extracting relevant questions that can be used to query the memory-relevant vector database.

数字
图 8.17语义记忆增强的工作原理

使用语义增强的优势在于能够提取更多相关的记忆。我们可以通过重新进入 Nexus 并创建一个新的语义记忆存储来验证这一点。

The benefit of using semantic augmentation is the increased ability to extract more relevant memories. We can see this in operation by jumping back into Nexus and creating a new semantic memory store.

图 8.18 展示了如何使用语义记忆配置新的记忆存储。目前,您还无法配置记忆、增强和摘要的具体功能提示。不过,阅读每个功能提示有助于您了解它们的工作原理。

Figure 8.18 shows how to configure a new memory store using semantic memory. As of yet, you can’t configure the specific function prompts for memory, augmentation, and summarization. However, it can be useful to read through each of the function prompts to gain a sense of how they work.

数字
图 8.18将内存存储类型更改为语义的配置

现在,如果您返回并添加事实和偏好,它们将转换为相关记忆类型的语义。图 8.19 展示了同一组语句如何填充到两种不同形式的记忆中。通常,输入到记忆中的语句会更具体地反映记忆的形式。

Now, if you go back and add facts and preferences, they will convert to the semantics of the relevant memory type. Figure 8.19 shows an example of memories being populated for the same set of statements into two different forms of memory. Generally, the statements entered into memory would be more specific to the form of memory.

数字
图 8.19比较两种不同记忆类型下对同一信息的记忆情况

记忆和知识可以显著帮助智能体应对各种应用场景。事实上,一个单一的记忆/知识存储系统可以为一个或多个智能体提供信息,从而允许对这两种存储系统进行更精细的个性化解读。接下来,我们将讨论记忆/知识压缩,以此结束本章。

Memory and knowledge can significantly assist an agent with various application types. Indeed, a single memory/knowledge store could feed one or multiple agents, allowing for further specialized interpretations of both types of stores. We’ll finish out the chapter by discussing memory/knowledge compression next.

8.7理解记忆和知识压缩

8.7 Understanding memory and knowledge compression

就像我们自身的记忆一样,随着时间的推移,记忆库也会被冗余信息和大量无关细节所淹没。在内部,我们的大脑会通过压缩或概括记忆来处理这些记忆碎片。我们的大脑会记住更重要的细节,以及更频繁地提取的记忆。

Much like our own memory, memory stores can become cluttered with redundant information and numerous unrelated details over time. Internally, our minds deal with memory clutter by compressing or summarizing memories. Our minds remember more significant details over less important ones, and memories accessed more frequently.

我们可以将类似的记忆压缩原理应用于智能体记忆和其他检索系统,以提取重要细节。这种压缩原理类似于语义增强,但它在预先聚类相关记忆的基础上增加了一个层级,使这些记忆能够被整体概括。

We can apply similar principles of memory compression to agent memory and other retrieval systems to extract significant details. The principle of compression is similar to semantic augmentation but adds another layer to the preclusters groups of related memories that can collectively be summarized.

图 8.20 展示了记忆/知识压缩的过程。首先,使用诸如 k-means 之类的算法对记忆或知识进行聚类。然后,将这些记忆组输入压缩函数,该函数将信息汇总并整合为更简洁的表示形式。

Figure 8.20 shows the process of memory/knowledge compression. Memories or knowledge are first clustered using an algorithm such as k-means. Then, the groups of memories are passed through a compression function, which summarizes and collects the items into more succinct representations.

数字
图 8.20记忆和知识压缩过程

Nexus 使用 k-means 最优聚类算法对知识库和内存存储进行压缩。图 8.21 显示了内存压缩界面。在该界面中,您可以看到以 3D 形式显示并按聚类方式排列的项目。左侧表格显示了聚类的大小(项目数量)。

Nexus provides for both knowledge and memory store compression using k-means optimal clustering. Figure 8.21 shows the compression interface for memory. Within the compression interface, you’ll see the items displayed in 3D and clustered. The size (number of items) of the clusters is shown in the left table.

数字
图 8.21内存压缩接口

如果集群中的项目数量庞大或分布不均,通常建议压缩内存甚至知识。压缩的具体用例可能因内存的用途和应用而异。但一般来说,如果对存储中的项目进行检查发现重复或冗余信息,则适合进行压缩。以下总结了一些可以从压缩中获益的应用用例。

Compressing memories and even knowledge is generally recommended if the number of items in a cluster is large or unbalanced. Each use case for compression may vary depending on the use and application of memories. Generally, though, if an inspection of the items in a store contains repetitive or duplicate information, it’s a good time for compression. The following is a summary of use cases for applications that would benefit from compression.

知识压缩的必要性

The case for knowledge compression

知识检索和增强已被证明能显著受益于压缩。具体效果会因应用场景而异,但一般来说,知识源越冗长,压缩带来的益处就越大。包含文学散文的文档,例如故事和小说,比代码库更能从中获益。然而,如果代码本身也高度重复,压缩同样可能带来益处。

Knowledge retrieval and augmentation have also been shown to benefit significantly from compression. Results will vary by use case, but generally, the more verbose the source of knowledge, the more it will benefit from compression. Documents that feature literary prose, such as stories and novels, will benefit more than, say, a base of code. However, if the code is likewise very repetitive, compression could also be shown to be beneficial.

关于多久施加一次压力的案例

The case for how often you apply compression

内存通常会受益于定期压缩,而知识库通常只在首次加载时起作用。压缩的频率很大程度上取决于内存的使用情况、频率和容量。

Memory will often benefit from the periodic compression application, whereas knowledge stores typically only help on the first load. How frequently you apply compression will greatly depend on the memory use, frequency, and quantity.

多次施加压缩的理由

The case for applying compression more than once

同时进行多次压缩已被证明可以提高检索性能。其他模式也表明,可以利用不同压缩级别的内存或知识。例如,知识库被压缩两次,从而得到三个不同级别的知识。

Multiple passes of compression at the same time has been shown to improve retrieval performance. Other patterns have also suggested using memory or knowledge at various levels of compression. For example, a knowledge store is compressed two times, resulting in three different levels of knowledge.

融合知识和记忆压缩的必要性

The case for blending knowledge and memory compression

如果一个系统专门针对特定的知识来源,并且该系统还使用了内存,那么可以通过进一步优化来整合存储。另一种方法是直接将文档的初始知识填充到内存中。

If a system is specialized to a particular source of knowledge and that system also employs memories, there may be further optimization to consolidate stores. Another approach is to populate memory with the starting knowledge of a document directly.

多个存储器或知识存储的必要性

The case for multiple memory or knowledge stores

在更高级的系统中,我们将探讨智能体如何运用与其工作流程相关的多个记忆和知识存储。例如,智能体可以在与不同用户对话时使用独立的记忆存储,甚至可能具备与不同用户群体共享不同记忆组的能力。记忆和知识检索是智能体系统的基石,现在我们可以总结一下前面讨论的内容,并在下一节回顾一些练习。

In more advanced systems, we’ll look at agents employing multiple memory and knowledge stores relevant to their workflow. For example, an agent could employ individual memory stores as part of its conversations with individual users, perhaps including the ability to share different groups of memory with different groups of individuals. Memory and knowledge retrieval are cornerstones of agentic systems, and we can now summarize what we covered and review some learning exercises in the next section.

8.8练习

8.8 Exercises

通过以下练习来加深你对相关内容的理解:

Use the following exercises to improve your knowledge of the material:

  • 练习 1  — 加载和拆分不同文档(中级)
  • Exercise 1 —Load and Split a Different Document (Intermediate)

目标——利用 LangChain 了解文档拆分对检索效率的影响。

ObjectiveUnderstand the effect of document splitting on retrieval efficiency by using LangChain.

任务

Tasks:

    • 选择其他文档(例如,新闻文章、科学论文或短篇小说)。
    • 使用 LangChain 加载文档并将其分割成块。
    • 分析文档是如何被分割成多个部分的,以及这种分割方式如何影响检索过程。
    • Select a different document (e.g., a news article, a scientific paper, or a short story).
    • Use LangChain to load and split the document into chunks.
    • Analyze how the document is split into chunks and how it affects the retrieval process.
  • 练习 2—— 语义搜索实验(中级)
  • Exercise 2 —Experiment with Semantic Search (Intermediate)

目标——通过执行语义搜索来比较各种向量化技术的有效性。

ObjectiveCompare the effectiveness of various vectorization techniques by performing semantic searches.

任务

Tasks:

    • 选择一组用于语义搜索的文档。
    • 使用 Word2Vec 或 BERT 嵌入等向量化方法代替 TF-IDF。
    • 进行语义搜索,并将结果与​​使用 TF-IDF 获得的结果进行比较,以了解差异和有效性。
    • Choose a set of documents for semantic search.
    • Use a vectorization method such as Word2Vec or BERT embeddings instead of TF–IDF.
    • Perform the semantic search, and compare the results with those obtained using TF–IDF to understand the differences and effectiveness.
  • 练习 3—— 实现自定义 RAG 工作流程(高级)
  • Exercise 3 —Implement a Custom RAG Workflow (Advanced)

目标——运用 LangChain 在实际环境中应用 RAG 的理论知识。

ObjectiveApply theoretical knowledge of RAG in a practical context using LangChain.

任务

Tasks:

    • 选择具体应用场景(例如,客户服务咨询或学术研究查询)。
    • 使用 LangChain 设计并实现自定义 RAG 工作流程。
    • 根据所选应用调整工作流程,并测试其有效性。
    • Choose a specific application (e.g., customer service inquiries or academic research queries).
    • Design and implement a custom RAG workflow using LangChain.
    • Tailor the workflow to suit the chosen application, and test its effectiveness.
  • 练习 4  — 构建知识库并尝试拆分模式(中级)
  • Exercise 4 —Build a Knowledge Store and Experiment with Splitting Patterns (Intermediate)

目标——了解不同的拆分模式和压缩方式如何影响知识检索。

ObjectiveUnderstand how different splitting patterns and compression affect knowledge retrieval.

任务

Tasks:

    • 构建一个知识库,并向其中填充几个文档。
    • 尝试不同的分割/组块模式,并分析它们对检索的影响。
    • 压缩知识库,并观察对查询性能的影响。
    • Build a knowledge store, and populate it with a couple of documents.
    • Experiment with different forms of splitting/chunking patterns, and analyze their effect on retrieval.
    • Compress the knowledge store, and observe the effects on query performance.
  • 练习 5—— 构建和测试各种存储器(高级)
  • Exercise 5 —Build and Test Various Memory Stores (Advanced)

目标——了解不同内存存储类型的独特性和应用案例。

ObjectiveUnderstand the uniqueness and use cases of different memory store types.

任务

Tasks:

    • 构建各种形式的记忆存储(对话式、语义式、情景式和程序式)。
    • 使用每种类型的内存存储与代理进行交互,并观察差异。
    • 压缩内存存储,并分析其对内存检索的影响。
    • Build various forms of memory stores (conversational, semantic, episodic, and procedural).
    • Interact with an agent using each type of memory store, and observe the differences.
    • Compress the memory store, and analyze the effect on memory retrieval.

概括

Summary

  • AI 应用中的记忆区分非结构化记忆和结构化记忆,突出了它们在为更相关的交互提供上下文提示方面的用途。
  • Memory in AI applications differentiates between unstructured and structured memory, highlighting their use in contextualizing prompts for more relevant interactions.
  • 检索增强生成 (RAG) 是一种利用向量嵌入和相似性搜索来检索相关内容,从而增强外部文档上下文提示的机制。
  • Retrieval augmented generation (RAG) is a mechanism for enhancing prompts with context from external documents, using vector embeddings and similarity search to retrieve relevant content.
  • 语义搜索与文档索引结合,利用 TF-IDF 和余弦相似度将文档转换为语义向量,从而增强了对索引文档进行语义搜索的能力。
  • Semantic search with document indexing converts documents into semantic vectors using TF–IDF and cosine similarity, enhancing the capability to perform semantic searches across indexed documents.
  • 向量数据库和相似性搜索将文档向量存储在向量数据库中,从而实现高效的相似性搜索并提高检索准确率。
  • Vector databases and similarity search stores document vectors in a vector database, facilitating efficient similarity searches and improving retrieval accuracy.
  • 文档嵌入利用 OpenAI 等模型生成能够保留文档上下文并促进语义相似性搜索的嵌入,从而捕捉语义含义。
  • Document embeddings capture semantic meanings, using models such as OpenAI’s models to generate embeddings that preserve a document’s context and facilitate semantic similarity searches.
  • LangChain 提供了多种执行 RAG 的工具,并且抽象了检索过程,从而可以轻松地在各种数据源和向量存储中实现 RAG 和内存系统。
  • LangChain provides several tools for performing RAG, and it abstracts the retrieval process, allowing for easy implementation of RAG and memory systems across various data sources and vector stores.
  • LangChain 中的短期记忆和长期记忆在 LangChain 内部实现了对话记忆,区分了短期缓冲模式和长期存储解决方案。
  • Short-term and long-term memory in LangChain implements conversational memory within LangChain, distinguishing between short-term buffering patterns and long-term storage solutions.
  • 在人工智能应用中实现可扩展的检索系统,将文档向量存储在数据库中以进行高效的相似性搜索至关重要。
  • Storing document vectors in databases for efficient similarity searches is crucial for implementing scalable retrieval systems in AI applications.
  • 代理知识直接与对文档或其他文本信息进行问答的一般 RAG 模式相关。
  • Agent knowledge directly relates to the general RAG pattern of performing question and answer on documents or other textual information.
  • 代理记忆是与 RAG 相关的一种模式,它捕捉代理与用户、自身和其他系统的交互。
  • Agent memory is a pattern related to RAG that captures the agentic interactions with users, itself, and other systems.
  • Nexus 是一个实现智能体知识和记忆系统的平台,包括建立用于文档检索的知识库和用于各种形式记忆的记忆库。
  • Nexus is a platform that implements agentic knowledge and memory systems, including setting up knowledge stores for document retrieval and memory stores for various forms of memory.
  • 语义记忆增强(语义记忆)区分不同类型的记忆(语义记忆、情景记忆、程序性记忆)。它通过语义增强来实现这些区分,从而增强智能体回忆和使用与记忆性质相关的特定信息的能力。
  • Semantic memory augmentation (semantic memory) differentiates between various types of memories (semantic, episodic, procedural). It implements them through semantic augmentation, enhancing agents’ ability to recall and use information relevantly specific to the nature of the memories.
  • 内存和知识压缩是将存储在内存和知识系统中的信息进行精简的技术,通过聚类和摘要来提高检索效率和相关性。
  • Memory and knowledge compression are techniques for condensing information stored in memory and knowledge systems, improving retrieval efficiency and relevancy through clustering and summarization.

9. 掌握代理提示流程

9 Mastering agent prompts with prompt flow

本章内容

This chapter covers

  • 了解系统化的提示工程并设置您的第一个提示流程
  • Understanding systematic prompt engineering and setting up your first prompt flow
  • 如何撰写有效的个人资料/人物角色提示
  • Crafting an effective profile/persona prompt
  • 评估个人资料:评分标准和依据
  • Evaluating profiles: Rubrics and grounding
  • 对大型语言模型配置文件进行基础评估
  • Grounding evaluation of a large language model profile
  • 比较提示:打造完美个人资料
  • Comparing prompts: Getting the perfect profile

本章我们将深入探讨“系统性测试变更”提示工程策略。如果您还记得,我们在第二章中介绍了 OpenAI 提示工程框架的总体策略。这些策略对于构建更好的提示,进而构建更完善的智能体配置文件和角色至关重要。理解这一策略对于我们的提示工程之旅至关重要。

In this chapter, we delve into the Test Changes Systematically prompt engineering strategy. If you recall, we covered the grand strategies of the OpenAI prompt engineering framework in chapter 2. These strategies are instrumental in helping us build better prompts and, consequently, better agent profiles and personas. Understanding this role is key to our prompt engineering journey.

系统性地测试变更是一项核心要素,微软为此开发了一款名为“提示流” (Prompt Flow)的工具,本章稍后将对此进行详细介绍。在了解提示流之前,我们需要理解为什么需要系统性的提示工程。

Test Changes Systematically is such a core facet of prompt engineering that Microsoft developed a tool around this strategy called prompt flow, described later in this chapter. Before getting to prompt flow, we need to understand why we need systemic prompt engineering.

9.1为什么我们需要系统性的快速工程

9.1 Why we need systematic prompt engineering

提示设计本质上是一个迭代过程。在构建提示时,您通常需要反复迭代和评估。为了更好地理解这一概念,不妨考虑将提示设计简单地应用于 ChatGPT 问题。

Prompt engineering, by its nature, is an iterative process. When building a prompt, you’ll often iterate and evaluate. To see this concept in action, consider the simple application of prompt engineering to a ChatGPT question.

您可以打开浏览器访问 ChatGPT(https://chat.openai.com/),在 ChatGPT 中输入以下(文本)提示,然后点击“发送消息”按钮(图 9.1 左侧显示了此对话的示例):

You can follow along by opening your browser to ChatGPT (https://chat.openai.com/), entering the following (text) prompt into ChatGPT, and clicking the Send Message button (an example of this conversation is shown in figure 9.1, on the left side):

你能推荐一些什么吗?

can you recommend something

数字
图 9.1应用提示工程和迭代的差异

我们可以看到 ChatGPT 的响应是要求提供更多信息。请打开与 ChatGPT 的新对话,并在右侧输入如图 9.1 所示的提示信息:

We can see that the response from ChatGPT is asking for more information. Go ahead and open a new conversation with ChatGPT, and enter the following prompt, as shown in figure 9.1, on the right side:

请问您能否推荐一部以中世纪为背景的穿越时空电影?

Can you please recommend a time travel movie set in the medieval period.

图 9.1 的结果清晰地展示了省略细节和更具体地提出请求之间的区别。我们只是运用了礼貌地撰写清晰指令的策略,ChatGPT 就为我们提供了很好的建议。但同时也要注意 ChatGPT 本身是如何引导用户进行更有效的提示的。图 9.2 中更新后的屏幕展示了 OpenAI 的提示工程策略。

The results in figure 9.1 show a clear difference between leaving out details and being more specific in your request. We just applied the tactic of politely Writing Clear Instructions, and ChatGPT provided us with a good recommendation. But also notice how ChatGPT itself guides the user into better prompting. The refreshed screen shown in figure 9.2 shows the OpenAI prompt engineering strategies.

数字
图 9.2 OpenAI 提示工程策略,按代理组件细分

我们刚刚运用简单的迭代方法改进了提示。我们可以通过使用系统提示/消息来扩展这个例子。图 9.3 展示了系统提示在迭代通信中的用法和作用。在第二章中,我们在多个示例中使用了系统消息/提示。

We just applied simple iteration to improve our prompt. We can extend this example by using a system prompt/message. Figure 9.3 demonstrates the use and role of the system prompt in iterative communication. In chapter 2, we used the system message/prompt in various examples.

数字
图 9.3 LLM 对话中收发的消息以及消息的迭代

您也可以在 ChatGPT 中尝试此操作。这次,请输入以下提示,并在提示中包含小写的单词system,然后换行(在消息窗口中输入换行符,不要按 Shift-Enter 发送消息):

You can also try this in ChatGPT. This time, enter the following prompt and include the word system in lowercase, followed by a new line (enter a new line in the message window without sending the message by pressing Shift-Enter):

系统

system

你是穿越时空电影方面的专家。

You are an expert on time travel movies.

ChatGPT 会回复一些友好的评论,如图 9.4 所示。因此,它乐于接受新的角色,并询问是否有任何后续问题。现在,请像之前一样输入以下通用提示:

ChatGPT will respond with some pleasant comments, as shown in figure 9.4. Because of this, it’s happy to accept its new role and asks for any follow-up questions. Now enter the following generic prompt as we did previously:

你能推荐一些什么吗?

can you recommend something

数字
图 9.4在之前的对话中添加系统提示的效果

我们刚刚看到了对提示信息进行迭代优化的过程,也就是提示信息工程,目的是提取更好的回复。这项工作是在使用 ChatGPT 用户界面进行的三次不同对话中完成的。虽然这不是最有效的方式,但它确实有效。

We’ve just seen the iteration of refining a prompt, the prompt engineering, to extract a better response. This was accomplished over three different conversations using the ChatGPT UI. While not the most efficient way, it works.

然而,我们尚未定义评估提示效果并确定提示何时有效的迭代流程。图 9.5 展示了一种利用迭代和评估系统进行提示工程的系统方法。

However, we haven’t defined the iterative flow for evaluating the prompt and determining when a prompt is effective. Figure 9.5 shows a systemic method of prompt engineering using a system of iteration and evaluation.

数字
图 9.5快速工程的系统方法

迭代和评估提示的系统涵盖了“系统性测试变更”策略的总体框架。评估提示的性能和有效性仍然是一个新兴领域,但我们将借鉴教育领域的一些技巧,例如评分标准和基础教学,这些内容将在本章后续章节中进行探讨。然而,正如下一节所述,在此之前,我们需要理解角色和代理配置文件之间的区别。

The system of iterating and evaluating prompts covers the broad Test Changes Systemically strategy. Evaluating the performance and effectiveness of prompts is still new, but we’ll use techniques from education, such as rubrics and grounding, which we’ll explore in a later section of this chapter. However, as spelled out in the next section, we need to understand the difference between a persona and an agent profile before we do so.

9.2了解代理人概况和角色

9.2 Understanding agent profiles and personas

代理配置文件是对描述代理的组件提示或消息的封装。它包括代理的角色、特殊指令以及其他可以指导用户或其他代理使用者的策略。

An agent profile is an encapsulation of component prompts or messages that describe an agent. It includes the agent’s persona, special instructions, and other strategies that can guide the user or other agent consumers.

图 9.6 展示了智能体配置文件的主要要素。这些要素与本书中描述的提示工程策略相对应。并非所有智能体都会使用完整智能体配置文件中的所有要素。

Figure 9.6 shows the main elements of an agent profile. These elements map to prompt engineering strategies described in this book. Not all agents will use all the elements of a full agent profile.

数字
图 9.6代理配置文件的组成部分

从根本上讲,智能体配置文件是一组描述智能体的提示信息。它可能包含与行动/工具、知识、记忆、推理、评估、规划和反馈相关的其他外部元素。这些元素的组合构成了完整的智能体提示配置文件。

At a basic level, an agent profile is a set of prompts describing the agent. It may include other external elements related to actions/tools, knowledge, memory, reasoning, evaluation, planning, and feedback. The combination of these elements comprises an entire agent prompt profile.

提示是智能体功能的核心。一个或一组提示驱动着智能体配置文件中的每个组件。对于操作/工具而言,这些提示定义明确,但正如我们所见,记忆和知识的提示会因使用场景的不同而存在显著差异。

Prompts are the heart of an agent’s function. A prompt or set of prompts drives each of the agent components in the profile. For actions/tools, these prompts are well defined, but as we’ve seen, prompts for memory and knowledge can vary significantly by use case.

AI代理配置文件的定义不仅仅是系统提示。提示流不仅允许我们构建构成代理配置文件的提示和代码,还能评估其有效性。下一节,我们将介绍提示流并开始使用它。

The definition of an AI agent profile is more than just a system prompt. Prompt flow can allow us to construct the prompts and code comprising the agent profile but also include the ability to evaluate its effectiveness. In the next section, we’ll open up prompt flow and start using it.

9.3设置您的第一个提示流程

9.3 Setting up your first prompt flow

Prompt Flow 是微软在其 Azure 机器学习工作室平台中开发的一款工具。该工具后来在 GitHub 上以开源项目的形式发布,并因此获得了更多关注和使用。虽然最初是作为应用程序平台开发的,但它在开发和评估提示/配置文件方面也展现出了强大的实力。

Prompt flow is a tool developed by Microsoft within its Azure Machine Learning Studio platform. The tool was later released as an open source project on GitHub, where it has attracted more attention and use. While initially intended as an application platform, it has since shown its strength in developing and evaluating prompts/ profiles.

由于 Prompt Flow 最初是为在 Azure 服务上运行而开发的,因此它具有强大的核心架构。该工具支持多线程批处理,使其成为大规模评估提示的理想选择。以下部分将介绍 Prompt Flow 的入门基础知识。

Because prompt flow was initially developed to run on Azure as a service, it features a robust core architecture. The tool supports multi-threaded batch processing, which makes it ideal for evaluating prompts at scale. The following section will examine the basics of starting with prompt flow.

9.3.1入门

9.3.1 Getting started

在开始本书练习之前,需要完成一些先决条件。本节和本章的相关先决条件如下表所示;请务必在尝试练习之前完成这些条件:

There are a few prerequisites to undertake before working through the exercises in this book. The relevant prerequisites for this section and chapter are shown in the following list; make sure to complete them before attempting the exercises:

  • Visual Studio Code (VS Code)  —有关安装说明,包括其他扩展,请参阅附录 A。
  • Visual Studio Code (VS Code) —Refer to appendix A for installation instructions, including additional extensions.
  • 提示流,VS Code 扩展 ——有关安装扩展的详细信息,请参阅附录 A。
  • Prompt flow, VS Code extension —Refer to appendix A for details on installing extensions.
  • Python 虚拟环境 ——有关设置虚拟环境的详细信息,请参阅附录 A。
  • Python virtual environment —Refer to appendix A for details on setting up a virtual environment.
  • 安装提示流包 ——在您的虚拟环境中,快速执行pip install如下操作:
  • Install prompt flow packages —Within your virtual environment, do a quick pip install, as shown here:
pip install promptflow promptflow-tools
pip install promptflow promptflow-tools
  • LLM(GPT-4 或更高版本)  ——您需要通过 OpenAI 或 Azure OpenAI Studio 访问 GPT-4 或更高版本。如果您需要访问这些资源的帮助,请参阅附录 B。
  • LLM (GPT-4 or above) —You’ll need access to GPT-4 or above through OpenAI or Azure OpenAI Studio. Refer to appendix B if you need assistance accessing these resources.
  • 书籍源代码 ——将书籍源代码克隆到本地文件夹;如果您需要克隆存储库的帮助,请参阅附录 A。
  • Book’s source code —Clone the book’s source code to a local folder; refer to appendix A if you need help cloning the repository.

打开 VS Code 并进入本书的源代码文件夹chapter 3。确保已连接虚拟环境,并已安装提示流包和扩展。

Open up VS Code to the book’s source code folder, chapter 3. Ensure that you have a virtual environment connected and have installed the prompt flow packages and extension.

首先,您需要在提示流扩展中创建与 LLM 资源的连接。在 VS Code 中打开提示流扩展,然后单击以打开连接。接下来,单击 LLM 资源旁边的加号以创建新连接,如图 9.7 所示。

First, you’ll want to create a connection to your LLM resource within the prompt flow extension. Open the prompt flow extension within VS Code, and then click to open the connections. Then, click the plus sign beside the LLM resource to create a new connection, as shown in figure 9.7.

数字
图 9.7创建新的提示流 LLM 连接

这将打开一个 YAML 文件,您需要在其中填写连接名称和其他与连接相关的信息。请按照说明操作,并且不要将 API 密钥输入到文档中,如图 9.8 所示。

This will open a YAML file where you’ll need to populate the connection name and other information relevant to your connection. Follow the directions, and don’t enter API keys into the document, as shown in figure 9.8.

数字
图 9.8设置 LLM 资源的连接信息

输入连接信息后,点击文档底部的“创建连接”链接。这将在文档下方打开一个终端提示符,要求您输入密钥。根据您的终端配置,您可能无法粘贴(Ctrl+V 或 Cmd+V)。或者,您可以将鼠标光标悬停在终端上,然后在 Windows 窗口中单击鼠标右键来粘贴密钥。

When the connection information is entered, click the Create Connection link at the bottom of the document. This will open a terminal prompt below the document, asking you to enter your key. Depending on your terminal configuration, you may be unable to paste (Ctrl-V, Cmd-V). Alternatively, you can paste the key by hovering the mouse cursor over the terminal and right-clicking on Windows.

现在我们来测试连接,首先打开chapter_09/promptflow/simpleflow文件夹中的简单流程文件。然后,在 VS Code 中打开该flow.dag.yaml文件。这是一个 YAML 文件,但提示流程扩展提供了一个可视化编辑器,可以通过单击文件顶部的“可视化编辑器”链接来访问,如图 9.9 所示。

We’ll now test the connection by first opening the simple flow in the chapter_09/promptflow/simpleflow folder. Then, open the flow.dag.yaml file in VS Code. This is a YAML file, but the prompt flow extension provides a visual editor that is accessible by clicking the Visual Editor link at the top of the file, as shown in figure 9.9.

数字
图 9.9打开提示流程可视化编辑器

打开可视化编辑器窗口后,您将看到一个表示流程和流程模块的图表。双击推荐模块,并设置连接名称、API 类型和模型或部署名称,如图 9.10 所示。

After the visual editor window is opened, you’ll see a graph representing the flow and the flow blocks. Double-click the recommender block, and set the connection name, API type, and model or deployment name, as shown in figure 9.10.

数字
图 9.10设置 LLM 连接详情

提示流程由一系列代码块组成,以一个Inputs代码块开始,以另一个代码块结束Outputs。在这个简单的流程中,recommender代码块代表 LLM 连接以及用于与模型对话的提示。echo在这个简单示例中,代码块会回显输入。

A prompt flow is composed of a set of blocks starting with an Inputs block and terminating in an Outputs block. Within this simple flow, the recommender block represents the LLM connection and the prompt used to converse with the model. The echo block for this simple example echoes the input.

在创建与 LLM 的连接时(无论是通过提示流还是 API),我们始终需要考虑以下关键参数(提示流文档:https://microsoft.github.io/promptflow):

When creating a connection to an LLM, either in prompt flow or through an API, here are the crucial parameters we always need to consider (prompt flow documentation: https://microsoft.github.io/promptflow):

  • 连接 — 这是连接名称,同时也代表您要连接的服务。提示流支持多种服务,包括本地部署的 LLM。
  • Connection —This is the connection name, but it also represents the service you’re connecting to. Prompt flow supports multiple services, including locally deployed LLMs.
  • API  — 这是 API 类型。选项包括chat聊天补全 API(例如 GPT-4)或completion较旧的补全模型(例如 OpenAI Davinci)。
  • API —This is the API type. The options are chat for a chat completion API, such as GPT-4, or completion for the older completion models, such as the OpenAI Davinci.
  • 模型 — 这可能是模型名称或部署名称,具体取决于您的服务连接。对于 OpenAI,这将是模型名称;对于 Azure OpenAI,这将是部署名称。
  • Model —This may be the model or deployment name, depending on your service connection. For OpenAI, this will be the model’s name, and for Azure OpenAI, it will represent the deployment name.
  • 温度 ——这代表模型响应的随机性或变异性。较高的温度值1表示响应的变异性较高,而较低的温度值则0表示希望响应的变异性为零。这是一个需要理解的关键参数,正如我们将看到的,它会因应用场景而异。
  • Temperature —This represents the stochasticity or variability of the model response. A value of 1 represents a high variability of responses, while 0 indicates a desire for no variability. This is a critical parameter to understand and, as we’ll see, will vary by use case.
  • 停止 — 此可选设置指示 LLM 停止创建令牌。它更适用于较旧的模型和开源模型。
  • Stop —This optional setting tells the call to the LLM to stop creating tokens. It’s more appropriate for older and open source models.
  • 最大令牌数 ——此设置限制对话中使用的令牌数量。了解令牌使用量对于评估 LLM 交互在规模化后的运行情况至关重要。如果您只是在探索和研究阶段,令牌数量可能不是问题。但是,在生产系统中,令牌数量代表了 LLM 的负载,使用大量令牌的连接可能无法很好地扩展。
  • Max tokens —This limits the number of tokens used in a conversation. Knowledge of how many tokens you use is crucial to evaluating how your LLM interactions will work when scaled. Counting tokens may not be a concern if you’re exploring and conducting research. However, in production systems, tokens represent the load on the LLM, and connections using numerous tokens may not scale well.
  • 高级参数 ——您可以设置更多选项来调整与 LLM 的交互,但我们将在本书的后续章节中介绍该主题。
  • Advanced parameters —You can set a few more options to tune your interaction with the LLM, but we’ll cover that topic in later sections of the book.

配置 LLM 模块后,向上滚动到“输入”模块部分,并查看 user_input 字段中显示的主要输入,如图 9.11 所示。将其保留为默认值,然后单击窗口顶部的“播放”按钮。

After configuring the LLM block, scroll up to the Inputs block section, and review the primary input shown in the user_input field, as shown in figure 9.11. Leave it as the default, and then click the Play button at the top of the window.

数字
图 9.11设置输入并启动流程

流程中的所有模块都会运行,结果将显示在终端窗口中。您可能会发现,输出结果会推荐一些关于时间旅行的电影。这是因为推荐模块已经设置了一个简单的用户画像,我们将在下一节中了解它的工作原理。

All the blocks in the flow will run, and the results will be shown in the terminal window. What you should find interesting is that the output shows recommendations for time travel movies. This is because the recommender block already has a simple profile set, and we’ll see how that works in the next section.

9.3.2使用 Jinja2 模板创建配置文件

9.3.2 Creating profiles with Jinja2 templates

该流程会根据其使用的提示或配置文件,提供穿越时空电影的推荐。默认情况下,提示流程使用 Jinja2 模板来定义提示的内容,也就是我们所说的配置文件。为了本书以及我们对人工智能代理的探索,我们将这些模板称为流程或代理的配置文件。

The flow responds with time travel movie recommendations because of the prompt or profile it uses. By default, prompt flow uses Jinja2 templates to define the content of the prompt or what we’ll call a profile. For the purposes of this book and our exploration of AI agents, we’ll refer to these templates as the profile of a flow or agent.

虽然 Prompt Flow 没有明确地将自己定义为助手或代理引擎,但它无疑符合生成代理和通用类型代理的标准。正如您将看到的,Prompt Flow 甚至支持将流程部署到容器中以及作为服务部署。

While prompt flow doesn’t explicitly refer to itself as an assistant or agent engine, it certainly meets the criteria of producing a proxy and general types of agents. As you’ll see, prompt flow even supports deployments of flows into containers and as services.

打开 VS Code chapter_09/promptflow/simpleflow/flow.dag.yaml,并在可视化编辑器中打开该文件。然后,找到“提示”字段,并单击recommended .jinja2链接,如图 9.12 所示。

Open VS Code to chapter_09/promptflow/simpleflow/flow.dag.yaml, and open the file in the visual editor. Then, locate the Prompt field, and click the recommended .jinja2 link, as shown in figure 9.12.

数字
图 9.12打开 Jinja2 提示模板并检查配置文件/提示的各个部分

Jinja 是一个模板引擎,Jinja2 是该引擎的一个特定版本。模板是定义任何形式文本文档的布局和组成部分的绝佳方式。它们已被广泛用于生成 HTML、JSON、CSS 和其他文档格式。此外,它们还支持直接在模板中应用代码。虽然构建提示或代理配置文件没有标准方法,但本书推荐使用 Jinja 等模板引擎。

Jinja is a templating engine, and Jinja2 is a particular version of that engine. Templates are an excellent way of defining the layout and parts of any form of text document. They have been extensively used to produce HTML, JSON, CSS, and other document forms. In addition, they support the ability to apply code directly into the template. While there is no standard way to construct prompts or agent profiles, our preference in this book is to use templating engines such as Jinja.

此时,请在recommended.jinja2模板的系统提示中更改角色。然后,在可视化编辑器中打开流程并单击“运行”按钮,运行流程的所有模块。下一节将介绍运行提示流程以进行测试或实际部署的其他方法。

At this point, change the role within the system prompt of the recommended.jinja2 template. Then, run all blocks of the flow by opening the flow in the visual editor and clicking the Play button. The next section will look at other ways of running prompt flow for testing or actual deployment.

9.3.3部署提示流 API

9.3.3 Deploying a prompt flow API

由于 Prompt Flow 也被设计为一种服务,因此它支持多种快速部署方式,可以部署为应用程序或 API。Prompt Flow 可以部署为从终端运行的本地 Web 应用程序和 API,也可以部署为 Docker 容器。

Because prompt flow was also designed to be deployed as a service, it supports a couple of ways to deploy as an app or API quickly. Prompt flow can be deployed as a local web application and API running from the terminal or as a Docker container.

从 VS Code 的可视化编辑器返回到该flow.dag.yaml文件。在窗口顶部,“运行”按钮旁边有几个选项,我们需要进一步了解。单击图 9.13 所示的“生成”按钮,然后选择部署为本地应用程序。系统将创建一个新的 YAML 文件来配置应用程序。保留默认设置,然后单击“启动本地应用程序”链接。

Return to the flow.dag.yaml file in the visual editor from VS Code. At the top of the window beside the Play button are several options we’ll want to investigate further. Click the Build button as shown in figure 9.13, and then select to deploy as a local app. A new YAML file will be created to configure the app. Leave the defaults, and click the Start Local App link.

数字
图 9.13构建并启动本地应用程序流程

这将以本地 Web 应用程序的形式启动流程,您会看到一个打开的浏览器标签页,如图 9.14 所示。在标有红色星号的必填字段 user_input 中输入一些文本。点击 Enter 键,等待几秒钟以获取回复。

This will launch the flow as a local web application, and you’ll see a browser tab open, as shown in figure 9.14. Enter some text into the user_input field, which is marked as required with a red asterisk. Click Enter and wait a few seconds for the reply.

数字
图 9.14以本地 Web 应用程序的形式运行流程

你应该会看到类似图 9.12 中所示的回复,其中流程或代理会回复一个时间旅行电影列表。这很棒——我们刚刚开发了第一个代理配置文件,相当于一个代理。但是,我们需要确定这些推荐的效果或价值。下一节,我们将探讨如何评估提示和配置文件。

You should see a reply like the one shown earlier in figure 9.12, where the flow or agent replies with a list of time travel movies. This is great—we’ve just developed our first agent profile and the equivalent of a proxy agent. However, we need to determine how successful or valuable the recommendations are. In the next section, we explore how to evaluate prompts and profiles.

9.4评估个人资料:评分标准和依据

9.4 Evaluating profiles: Rubrics and grounding

任何提示或代理配置文件的关键要素在于其完成给定任务的有效性。正如我们在推荐示例中所看到的,提示代理配置文件提供推荐列表相对容易,但要了解这些推荐是否有用,则需要我们评估其响应。

A key element of any prompt or agent profile is how well it performs its given task. As we see in our recommendation example, prompting an agent profile to give a list of recommendations is relatively easy, but knowing whether those recommendations are helpful requires us to evaluate the response.

幸运的是,提示流的设计初衷就是为了大规模评估提示/配置文件。其强大的基础架构支持将 LLM 交互的评估并行化,并将其作为工作进程进行管理,从而能够快速完成数百个配置文件评估及其变体。

Fortunately, prompt flow has been designed to evaluate prompts/profiles at scale. The robust infrastructure allows for the evaluation of LLM interactions to be parallelized and managed as workers, allowing hundreds of profile evaluations and variations to happen quickly.

下一节,我们将探讨如何配置提示流,以便让不同的提示/配置文件相互对比。在评估配置文件的性能之前,我们需要了解这一点。

In the next section, we look at how prompt flow can be configured to run prompt/ profile variations against each other. We’ll need to understand this before evaluating profiles’ performance.

提示流提供了一种机制,允许在LLM提示/配置文件中存在多种变体。该工具非常适合比较配置文件变体之间的细微或显著差异。在执行批量评估时,它对于快速评估配置文件的性能至关重要。

Prompt flow provides a mechanism to allow for multiple variations within an LLM prompt/profile. This tool is excellent for comparing subtle or significant differences between profile variations. When used in performing bulk evaluations, it can be invaluable for quickly assessing the performance of a profile.

recommender_with_variations/flow.dag.yaml如图 9.15 所示,在 VS Code 和流程可视化编辑器中打开文件。这次,我们将配置文件做得更加通用,并允许在输入级别进行自定义。这样,我们的推荐内容就可以扩展到任何类型,而不仅仅是穿越时空的电影。

Open the recommender_with_variations/flow.dag.yaml file in VS Code and the flow visual editor, as shown in figure 9.15. This time, we’re making the profile more generalized and allowing for customization at the input level. This allows us to expand our recommendations to anything and not just time travel movies.

数字
图 9.15推荐系统,包含不同的流程和扩展的输入

新增的“主题”、“类型”、“格式”和“自定义”输入项使我们能够定义一个可以轻松调整以适应任何推荐的配置文件。这也意味着我们必须根据推荐用例对这些输入项进行预处理。预处理这些输入项的方法有很多种;图 9.16 展示了两种预处理输入项的示例。图中展示了两种预处理输入项的选项,选项 A 和选项 B。选项 A 代表经典的用户界面;例如,用户可以选择主题或类型等。选项 B 则使用代理/聊天助手与用户进行交互,以便更好地了解用户所需的主题、类型等信息。

The new inputs Subject, Genre, Format, and Custom allow us to define a profile that can easily be adjusted to any recommendation. This also means that we must prime the inputs based on the recommendation use case. There are multiple ways to prime these inputs; two examples of priming inputs are shown in figure 9.16. The figure shows two options, options A and B, for priming inputs. Option A represents the classic UI; perhaps there are objects for the user to select the subject or genre, for example. Option B places a proxy/chat agent to interact with the user better to understand the desired subject, genre, and so on.

数字
图 9.16用户与代理配置文件交互的选项,用于向代理配置文件输入信息

即使考虑到 LLM 的强大功能,您可能仍然需要或想要使用选项 A。选项 A 的优点在于,您可以像使用任何现代用户界面一样,对输入进行约束和验证。然而,选项 A 的缺点是,这种约束行为可能会限制未来的使用场景。

Even considering the power of LLMs, you may still want or need to use option A. The benefit of option A is that you can constrain and validate the inputs much like you do with any modern UI. Alternatively, the downside of option A is that the constrained behavior may limit and restrict future use cases.

方案 B 提供了一种更流畅、更自然的方式,无需传统的用户界面。它比方案 A 功能更强大、可扩展性更强,但也引入了更多评估上的未知因素。然而,如果方案 B 使用的代理程序编写得当,就能极大地帮助收集用户更优质的信息。

Option B represents a more fluid and natural way without a traditional UI. It’s far more powerful and extensible than option A but also introduces more unknowns for evaluation. However, if the proxy agent that option B uses is written well, it can assist a lot in gathering better information from the user.

您选择的选项将决定您如何评估个人资料。如果您可以接受功能受限的用户界面,那么输入内容也可能仅限于一组离散值。目前,我们假设输入引导选择选项 B,这意味着输入值将由其名称定义。

The option you choose will dictate how you need to evaluate your profiles. If you’re okay with a constrained UI, then it’s likely that the inputs will also be constrained to a set of discrete values. For now, we’ll assume option B for input priming, meaning the input values will be defined by their name.

要返回 VS Code 并查看带有变体流程的推荐器可视化视图,请单击图 9.15 中所示的图标以打开变体并允许编辑。然后,单击recommend.jinja2recommender_variant_1.jinja2链接以并排打开文件,如图 9.17 所示。

To get back to VS Code and the visual view of the recommender with variants flow, click the icon shown earlier in figure 9.15 to open the variants and allow editing. Then, click the recommend.jinja2 and recommender_variant_1.jinja2 links to open the files side by side, as shown in figure 9.17.

数字
图 9.17推荐器的变体配置文件模板并排比较

图 9.17 展示了不同变体配置文件之间的差异。一种配置文件将输入注入用户提示符,另一种则将输入注入系统提示符。然而,必须理解的是,变体不仅仅体现在配置文件设计上,如表 9.1 所示。

Figure 9.17 demonstrates the difference between the variant profiles. One profile injects the inputs into the user prompt, and the other injects them into the system prompt. However, it’s essential to understand that variations can encompass more than profile design, as identified in table 9.1.

表 9.1即时流中的 LLM 变化选项
选项
评估选项示例
笔记
Jinja2 提示模板

比较系统提示音变体、用户提示音变体或混合提示音变体。

这里可以应用无数种组合和技巧。响应式工程技术也在不断发展。

法学硕士

比较 GPT-9.5 和 GPT-4。

比较 GPT-4 和 GPT-4 Turbo。

比较开源模型和商业模型。

这是评估模型性能并使其与提示相匹配的有效方法。它还可以帮助您调整配置文件,使其能够与开源模型和/或更经济的模型配合使用。

温度

将温度 0(无随机性)与温度 1(最大随机性)进行比较。

温度的变化会显著改变某些提示的响应,这可能会提高或降低性能。

最大代币数

比较小额代币和大额代币。

这可以帮助您减少和最大化代币使用量。

高级参数

与诸如top_p、、、和等presence_penalty选项进行比较。frequency_penaltylogit_bias

我们将在后面的章节中介绍这些高级参数的使用方法。

函数调用

比较不同的函数调用方式。

函数调用将在本章后面部分进行讲解。

在这个简单的例子中,我们将通过改变输入内容来模拟系统提示或用户提示,从而实现提示的变体。请参考图 9.17 了解其具体效果。然后,我们可以点击顶部的“播放(全部运行)”按钮并选择“两者”,快速运行这两个变体,如图 9.18 所示。

For this simple example, we’re just going to use prompt variations by varying the input to reflect in either the system or user prompt. Refer to figure 9.17 for what this looks like. We can then quickly run both variations by clicking the Play (Run All) button at the top and choosing both, as shown in figure 9.18.

数字
图 9.18同时运行两种提示变体

在终端窗口中,您将看到两次运行的结果。结果可能看起来相似,因此我们现在必须进入下一节,讨论如何评估不同版本之间的差异。

In the terminal window, you’ll see the results of both runs. The results will likely look similar, so now we must move on to how we evaluate the difference between variations in the next section.

9.5理解评分标准和基础

9.5 Understanding rubrics and grounding

评估提示/配置文件的性能通常不能简单地用准确率或正确率来衡量。衡量配置文件的性能取决于具体用例和预期结果。如果评估能简单地判断回答是否正确,那就再好不过了。然而,在大多数情况下,评估并非如此简单。

Evaluation of prompt/profile performance isn’t something we can typically do using a measure of accuracy or correct percentage. Measuring the performance of a profile depends on the use case and desired outcome. If that is as simple as determining if the response was right or wrong, all the better. However, in most cases, evaluation won’t be that simple.

在教育领域,评分标准(rubric)的概念定义了一套结构化的标准和准则,学生必须达到这些标准才能获得特定成绩。评分标准也可用于指导学生完成某个项目或任务。我们可以按照以下步骤定义评分标准,用于评估学生完成某个项目或任务的表现:

In education, the rubric concept defines a structured set of criteria and standards a student must establish to receive a particular grade. A rubric can also be used to define a guide for the performance of a profile or prompt. We can follow these steps to define a rubric we can use to evaluate the performance of a profile or prompt:

  1. 明确目的和目标。确定您希望个人资料或代理人达成的目标。例如,您是想评估针对特定受众的推荐质量,还是评估针对特定主题、格式或其他输入的整体质量?
  2. Identify the purpose and objectives. Determine the goals you want the profile or agent to accomplish. For example, do you want to evaluate the quality of recommendations for a given audience or overall quality for a given subject, format, or other input?
  3. 制定评估标准。制定一套用于评估推荐信的标准或维度。这些标准应与您的目标保持一致,并为评估提供清晰的指导。每个标准都应具体且可衡量。例如,您可以根据推荐信与推荐信类型的契合度,以及主题和格式来评估其契合度。
  4. Define criteria. Develop a set of criteria or dimensions that you’ll use to evaluate the profile. These criteria should align with your objectives and provide clear guidelines for assessment. Each criterion should be specific and measurable. For example, you may want to measure a recommendation by how well it fits with the genre and then by subject and format.
  5. 建立评分标准。建立一个评分标准,描述每个标准的表现水平。标准评分标准包括数值评分标准(例如,1-5)或描述性评分标准(例如,优秀、良好、一般、差)。
  6. Create a scale. Establish a rating scale that describes the levels of performance for each criterion. Standard scales include numerical scales (e.g., 1–5) or descriptive scales (e.g., Excellent, Good, Fair, Poor).
  7. 请提供描述。对于评分标准中的每个等级,请提供清晰简洁的描述,说明每个标准下,优秀表现和较差表现分别代表什么。
  8. Provide descriptions. For each level on the scale, provide clear and concise descriptions that indicate what constitutes a strong performance and what represents a weaker performance for each criterion.
  9. 应用评分标准。在评估题目或个人资料时,请使用评分标准,根据既定标准评估题目的表现。根据每个级别的描述,为每个标准分配分数或等级。
  10. Apply the rubric. When assessing a prompt or profile, use the rubric to evaluate the prompt’s performance based on the established criteria. Assign scores or ratings for each criterion, considering the descriptions for each level.
  11. 计算总分。根据评分标准,您可以将每个评分标准的得分相加来计算总分,或者如果某些评分标准比其他标准更重要,则可以使用加权平均值。
  12. Calculate the total score. Depending on your rubric, you may calculate a total score by summing up the scores for each criterion or using a weighted average if some criteria are more important than others.
  13. 确保评估一致性。如果有多位评估者对同一申请材料进行评估,确保评分的一致性至关重要。
  14. Ensure evaluation consistency. If multiple evaluators are assessing the profile, it’s crucial to ensure consistency in grading.
  15. 审阅、修订和迭代。定期审阅和修订评分标准,确保其与您的评估目标和目的保持一致。根据需要进行调整,以提高其有效性。
  16. Review, revise, and iterate. Periodically review and revise the rubric to ensure it aligns with your assessment goals and objectives. Adjust as needed to improve its effectiveness.

“基准性”是一个可以应用于个人资料和题目评估的概念——它定义了回答与特定评分标准和准则的契合程度。你也可以将“基准性”理解为题目或个人资料输出的基准预期。

Grounding is a concept that can be applied to profile and prompt evaluation—it defines how well a response is aligned with a given rubric’s specific criteria and standards. You can also think of grounding as the baseline expectation of a prompt or profile output.

以下列表总结了在使用接地进行性能评估时的一些其他重要注意事项:

This list summarizes some other important considerations when using grounding with profile evaluation:

  • 衔接是指使回答与评分标准和提示所定义的标准、目标和背景保持一致。
  • Grounding refers to aligning responses with the criteria, objectives, and context defined by the rubric and prompt.
  • 基础性评估包括评估回答是否直接针对评分标准,是否紧扣主题,以及是否遵守任何提供的说明。
  • Grounding involves assessing whether the response directly addresses the rubric criteria, stays on topic, and adheres to any provided instructions.
  • 评估人员和评估结果在评估接地情况时,会衡量其准确性、相关性和对标准的遵守情况。
  • Evaluators and evaluations gauge the accuracy, relevance, and adherence to standards when assessing grounding.
  • 接地确保回答结果牢牢扎根于指定的背景中,使评估过程更加客观和有意义。
  • Grounding ensures that the response output is firmly rooted in the specified context, making the assessment process more objective and meaningful.

一份条理清晰的回答应符合所有评分标准,并涵盖给定的背景和目标。而条理不清的回答则会不及格,或者完全不符合任何标准、背景和目标。

A well-grounded response aligns with all the rubric criteria within the given context and objectives. Poorly grounded responses will fail or miss the entire criteria, context, and objectives.

由于评价标准和基础性概念可能仍然比较抽象,让我们来看看如何将它们应用到我们当前的推荐系统中。以下列表展示了如何定义评价标准并将其应用于我们的推荐系统示例:

As the concepts of rubrics and grounding may still be abstract, let’s look at applying them to our current recommender example. Following is a list that follows the process for defining a rubric as applied to our recommender example:

  1. 明确目的和目标。我们的个人资料/提示的目的是根据主题、格式、类型和自定义输入,推荐三件最佳商品。
  2. Identify the purpose and objectives. The purpose of our profile/prompt is to recommend three top items given a subject, format, genre, and custom input.
  3. 定义评判标准。为简化起见,我们将评估特定推荐与给定输入标准(主题、格式和类型)的契合程度。例如,如果用户询问电影格式,而推荐的是书籍,我们预期其在格式标准上的得分会很低。
  4. Define criteria. For simplicity, we’ll evaluate how a particular recommendation aligns with the given input criteria, subject, format, and genre. For example, if a profile recommends a book when asked for a movie format, we expect a low score in the format criteria.
  5. 创建一个评分标准。为了简单起见,我们将使用 1-5 的评分标准(1 为差,5 为优)。
  6. Create a scale. Again, keeping things simple, we’ll use a scale of 1–5 (1 is poor, and 5 is excellent).
  7. 请提供描述。有关评分标准的总体描述,请参见表 9.2。
  8. Provide descriptions. See the general descriptions for the rating scale shown in table 9.2.
  9. 应用评分标准。在这个阶段,评分标准已经确定,因此手动对照建议评估评分标准是一个很好的练习。
  10. Apply the rubric. With the rubric assigned at this stage, it’s an excellent exercise to evaluate the rubric against recommendations manually.
  11. 计算总分。根据我们的评分标准,我们将所有评分标准的得分取平均值,得出总分。
  12. Calculate the total score. For our rubric, we’ll average the score for all criteria to provide a total score.
  13. 确保评估的一致性。我们将采用的评估方法能够提供非常一致的结果。
  14. Ensure evaluation consistency. The technique we’ll use for evaluation will provide very consistent results.
  15. 审阅、修改、迭代。我们将审阅、比较并迭代我们的个人资料、评分标准和评估本身。
  16. Review, revise, and iterate. We’ll review, compare, and iterate on our profiles, rubrics, and the evaluations themselves.
表 9.2评分标准
等级
描述
1

对齐效果差:这与根据标准所预期的结果相反。

2

匹配度差:这不符合给定的标准。

3

平庸的对齐方式:它可能符合也可能不符合既定标准。

4

契合度高:可能并非 100% 符合标准,但其他方面契合度很高。

5

对齐效果极佳:根据给定的标准,这是一个很好的推荐。

现在,我们可以运用这个基本评分标准来评估我们个人资料中的回复。您可以手动评估,或者像下一节中将要介绍的那样,使用第二个法学硕士(LLM)个人资料进行评估。

This basic rubric can now be applied to evaluate the responses for our profile. You can do this manually, or as you’ll see in the next section, using a second LLM profile.

9.6使用LLM剖面进行接地评估

9.6 Grounding evaluation with an LLM profile

本部分将采用另一个LLM提示/模板进行评估和依据说明。第二个LLM提示将在生成建议后添加一个模块。它将处理生成的建议,并根据之前的评分标准对每条建议进行评估。

This section will employ another LLM prompt/profile for evaluation and grounding. This second LLM prompt will add another block after the recommendations are generated. It will process the generated recommendations and evaluate each one, given the previous rubric.

在 GPT-4 和其他先进的语言学习模型出现之前,我们从未考虑过使用其他语言学习模型提示来评估或构建一个模型。概况分析。在使用LLM模型构建概况时,通常需要使用不同的模型。但是,如果要比较不同的概况,则使用相同的LLM模型进行评估和概况构建是合适的。

Before GPT-4 and other sophisticated LLMs came along, we would have never considered using another LLM prompt to evaluate or ground a profile. You often want to use a different model when using LLMs to ground a profile. However, if you’re comparing profiles against each other, using the same LLM for evaluation and grounding is appropriate.

recommender_with_LLM_evaluation\flow.dag.yaml在提示流程可视化编辑器中打开文件,向下滚动到该evaluate_recommendation模块,然后单击evaluate_recommendation.jinja2链接打开文件,如图 9.19 所示。图中标识了评分标准的每个部分。

Open the recommender_with_LLM_evaluation\flow.dag.yaml file in the prompt flow visual editor, scroll down to the evaluate_recommendation block, and click the evaluate_recommendation.jinja2 link to open the file, as shown in figure 9.19. Each section of the rubric is identified in the figure.

数字
图 9.19评估提示,概述了评分标准的各个部分

我们制定了一套评价标准,它不仅定义明确,而且以提示的形式呈现,可用于评估推荐意见。这使我们能够自动评估针对特定用户画像的推荐意见的有效性。当然,您也可以使用该评价标准手动评分和评估推荐意见,以获得更准确的基准。

We have a rubric that is not only well defined but also in the form of a prompt that can be used to evaluate recommendations. This allows us to evaluate the effectiveness of recommendations for a given profile—automatically. Of course, you can also use the rubric to score and evaluate the recommendations manually for a better baseline.

回到recommender_with_LLM_evaluation流程可视化编辑器,我们可以点击播放按钮来运行流程并观察输出结果。您可以运行单个推荐方案,也可以在提示时运行两种方案。使用默认输入进行单次评估的输出结果如下所示。

Returning to the recommender_with_LLM_evaluation flow visual editor, we can run the flow by clicking the Play button and observing the output. You can run a single recommendation or run both variations when prompted. The output of a single evaluation using the default inputs is shown in the following listing.

清单 9.1 LLM 评分标准评估结果
{
    推荐书目:《蝴蝶效应》
主题:5
格式:5
类型:4

标题:入门指南
主题:5
格式:5
类型:4

标题:时光大盗
主题:5
格式:5
类型:5英寸
}
{
    "recommendations": "Title: The Butterfly Effect
Subject: 5
Format: 5
Genre: 4

Title: Primer
Subject: 5
Format: 5
Genre: 4

Title: Time Bandits
Subject: 5
Format: 5
Genre: 5"
}

我们现在有了一套用于评估推荐系统的准则,评估过程会使用第二个LLM提示自动运行。下一节,我们将探讨如何同时执行多个评估,并计算所有评估的总分。

We now have a rubric for grounding our recommender, and the evaluation is run automatically using a second LLM prompt. In the next section, we look at how to perform multiple evaluations simultaneously and then at a total score for everything.

9.7对比个人资料:找到完美的个人资料

9.7 Comparing profiles: Getting the perfect profile

有了对评分标准和基础知识的理解,我们现在可以着手评估和迭代完善理想的个人资料了。不过在此之前,我们需要清理LLM评估模块的输出结果。这需要我们将推荐结果解析成更符合Python风格的代码,我们将在下一节中讨论这个问题。

With our understanding of rubrics and grounding, we can now move on to evaluating and iterating the perfect profile. Before we do that, though, we need to clean up the output from the LLM evaluation block. This will require us to parse the recommendations into something more Pythonic, which we’ll tackle in the next section.

9.7.1解析 LLM 评估输出

9.7.1 Parsing the LLM evaluation output

由于评估模块的原始输出是文本,我们现在需要将其解析成更易于使用的格式。当然,编写解析函数很简单,但还有更好的方法可以自动转换响应。我们在第 5 章“代理动作”中介绍了返回响应的更佳方法。

As the raw output from the evaluation block is text, we now want to parse that into something more usable. Of course, writing parsing functions is simple, but there are better ways to cast responses automagically. We covered better methods for returning responses in chapter 5, on agent actions.

在 VS Code 中打开chapter_09\prompt_flow\recommender_with_parsing\flow.dag.yaml,并在可视化编辑器中查看流程。找到该parsing_results代码块,然后单击链接在编辑器中打开 Python 文件,如图 9.20 所示。

Open chapter_09\prompt_flow\recommender_with_parsing\flow.dag.yaml in VS Code, and look at the flow in the visual editor. Locate the parsing_results block, and click the link to open the Python file in the editor, as shown in figure 9.20.

数字
图 9.20parsing_results.py在 VS Code 中打开文件

文件的代码parsing_results.py如清单 9.2 所示。

The code for the parsing_results.py file is shown in listing 9.2.

清单 9.2 parsing_results.py
来自 promptflow 导入工具

@工具     #1
def parse(input: str) -> str:
    将推荐影片拆分成单独的电影单元
    rblocks = input.strip().split("\n\n")     #2

    # 将单个推荐块解析为字典的函数
    def parse_block(block):
        lines = block.split('\n')
        rdict = {}
        对于 lines 中的每行:
            kvs = line.split(':')
            key, value = kvs[0], kvs[1]
            rdict[key.lower()] = value    #3
        返回判决

    parsed = [parse_block(block) for block in rblocks]   #4

    返回已解析结果
from promptflow import tool

@tool      #1
def parse(input: str) -> str:
    # Splitting the recommendations into individual movie blocks
    rblocks = input.strip().split("\n\n")      #2

    # Function to parse individual recommendation block into dictionary
    def parse_block(block):
        lines = block.split('\n')
        rdict = {}
        for line in lines:
            kvs = line.split(': ')
            key, value = kvs[0], kvs[1]
            rdict[key.lower()] = value     #3
        return rdict

    parsed = [parse_block(block) for block in rblocks]    #4

    return parsed
#1 用于标记工具块的特殊装饰器

#2 分割输入并添加双换行符

#3 创建字典条目并设置值

#4 遍历每个块并将其解析为键值对字典

我们将清单 9.1 中的推荐输出(一个字符串)转换为字典。因此,以下代码会将该字符串转换为如下所示的 JSON 块:

We’re converting the recommendations output from listing 9.1, which is just a string, into a dictionary. So this code will convert this string into the JSON block shown next:

解析前:

Before parsing:

标题:蝴蝶效应
主题:5
格式:5
类型:4

标题:入门指南
主题:5
格式:5
类型:4

标题:时光大盗
主题:5
格式:5
类型:5英寸
"Title: The Butterfly Effect
Subject: 5
Format: 5
Genre: 4

Title: Primer
Subject: 5
Format: 5
Genre: 4

Title: Time Bandits
Subject: 5
Format: 5
Genre: 5"

解析后:

After parsing:

       {
            标题:蝴蝶效应
            主题:5,
            "格式": "5,
            "gen": "4"
        },
        {
            标题: 入门读物,
            主题:5,
            "格式": "5,
            "gen": "4"
        },
        {
            标题:《时光大盗》
            主题:5,
            "格式": "5,
            "genre": "5"
        }
       {
            "title": " The Butterfly Effect
            "subject": "5",
            "format": "5",
            "genre": "4"
        },
        {
            "title": " Primer",
            "subject": "5",
            "format": "5",
            "genre": "4"
        },
        {
            "title": " Time Bandits",
            "subject": "5",
            "format": "5",
            "genre": "5"
        }

该模块的输出parsing_results现在会传递到输出端,并被封装在一个推荐列表中。我们可以通过运行流程来查看这一切是如何实现的。

The output of this parsing_results block now gets passed to the output and is wrapped in a list of recommendations. We can see what all this looks like by running the flow.

在可视化编辑器中打开flow.dag.yaml流程图,然后点击“运行(全部运行)”按钮。务必选择使用两种推荐器变体。您将看到两种变体都运行并输出到终端。

Open flow.dag.yaml for the flow in the visual editor, and click the Play (Run All) button. Be sure to select to use both recommender variants. You’ll see both variations run and output to the terminal.

至此,我们已经拥有一个完整的推荐和LLM评估流程,该流程会针对每个输出中的每个标准输出一个分数。然而,为了对特定用户画像进行全面评估,我们需要根据不同的标准生成多个推荐。下一节我们将探讨如何对流程进行批量处理。

At this point, we have a full working recommendation and LLM evaluation flow that outputs a score for each criterion on each output. However, to do comprehensive evaluations of a particular profile, we want to generate multiple recommendations with various criteria. We’ll see how to do batch processing of flows in the next section.

9.7.2在提示流中运行批处理

9.7.2 Running batch processing in prompt flow

在我们的通用推荐模型中,我们希望评估各种输入条件如何影响生成的推荐结果。幸运的是,提示流可以批量处理我们想要测试的任何变体。唯一的限制就是我们愿意投入的时间和金钱。

In our generic recommendation profile, we want to evaluate how various input criteria can affect the generated recommendations. Fortunately, prompt flow can batch-process any variations we want to test. The limit is only the time and money we want to spend.

要执行批量处理,我们首先必须创建一个包含输入条件的 JSON 行 (JSONL) 或 JSON 列表文档。如果您还记得,我们的输入条件在 JSON 格式中如下所示:

To perform batch processing, we must first create a JSON Lines (JSONL) or JSON list document of our input criteria. If you recall, our input criteria looked like the following in JSON format:

{
    主题: 时间旅行”
    "格式": "书籍",
    "类型": "奇幻",
    “自定义”:“不包含任何R级内容”
}
{
    "subject": "time travel",
    "format": "books",
    "genre": "fantasy",
    "custom": "don't include any R rated content"
}

我们希望创建一个类似刚才所示的 JSON 对象列表,最好是随机排列的。当然,最简单的方法是使用以下提示符让 ChatGPT 创建一个 JSONL 文档:

We want to create a list of JSON objects like that just shown, preferably in a random manner. Of course, the simple way to do this is to prompt ChatGPT to create a JSONL document using the following prompt:

我正在开发一个推荐系统。该系统将根据以下条件推荐任何内容:

I am developing a recommendation agent. The agent will recommend anything given the following criteria:

1. 主题 - 例如:时间旅行、烹饪、假期

1. subject - examples: time travel, cooking, vacation

2. 格式 - 例如:书籍、电影、游戏

2. format - examples: books, movies, games

3. 类型:纪录片、动作片、爱情片

3. genre: documentary, action, romance

4. 自定义:请勿包含任何限制级内容

4. custom: don't include any R rated content

请根据这些条件生成一个随机列表,并以 JSONL 格式输出。列表中请包含 10 个项目。

Can you please generate a random list of these criteria and output it in the format of a JSON Lines file, JSONL. Please include 10 items in the list.

请前往 ChatGPT 并输入上述提示进行尝试。您可以在流程文件夹中找到之前生成的名为 `<flow_name>` 的文件\bulk_recommend.jsonl。该文件的内容如下所示,供您参考:

Try this out by going to ChatGPT and entering the preceding prompt. A previously generated file can be found in the flow folder, called \bulk_recommend.jsonl. The contents of this file have been shown here for reference:

{
  “主题”:“时间旅行”,
  "格式": "书籍",
  "genre": "奇幻",
  “自定义”: “不包含任何R级内容”
}
{
  主题:太空探索,
  "格式": "播客",
  "genre": "科幻",
  “自定义”: “仅包含适合家庭观看的内容”
}
{
  “主题”:“神秘”,
  "格式": "播客",
  "genre": "奇幻",
  “自定义”: “不包含任何R级内容”
}
{
  主题:太空探索,
  "格式": "播客",
  "类型": "动作",
  “自定义”: “仅包含适合家庭观看的内容”
}
{
  主题:假期,
  "格式": "书籍",
  "genre": "惊悚片",
  “自定义”: “不包含任何R级内容”
}
{
  “主题”:“神秘”,
  "格式": "书籍",
  "genre": "科幻",
  “自定义”: “不包含任何R级内容”
}
{
  “主题”:“神秘”,
  "格式": "书籍",
  "genre": "romance",
  “自定义”: “不包含任何R级内容”
}
{
  主题:假期,
  "格式": "电影",
  "genre": "奇幻",
  “自定义”: “不包含任何R级内容”
}
{
  “主题”:“烹饪”,
  "格式": "电视节目",
  "genre": "惊悚片",
  “自定义”: “仅包含适合家庭观看的内容”
}
{
  “主题”:“神秘”,
  "格式": "电影",
  "genre": "romance",
  “自定义”: “仅包含适合家庭观看的内容”
}
{
  "subject": "time travel",
  "format": "books",
  "genre": "fantasy",
  "custom": "don't include any R rated content"
}
{
  "subject": "space exploration",
  "format": "podcasts",
  "genre": "sci-fi",
  "custom": "include family-friendly content only"
}
{
  "subject": "mystery",
  "format": "podcasts",
  "genre": "fantasy",
  "custom": "don't include any R rated content"
}
{
  "subject": "space exploration",
  "format": "podcasts",
  "genre": "action",
  "custom": "include family-friendly content only"
}
{
  "subject": "vacation",
  "format": "books",
  "genre": "thriller",
  "custom": "don't include any R rated content"
}
{
  "subject": "mystery",
  "format": "books",
  "genre": "sci-fi",
  "custom": "don't include any R rated content"
}
{
  "subject": "mystery",
  "format": "books",
  "genre": "romance",
  "custom": "don't include any R rated content"
}
{
  "subject": "vacation",
  "format": "movies",
  "genre": "fantasy",
  "custom": "don't include any R rated content"
}
{
  "subject": "cooking",
  "format": "TV shows",
  "genre": "thriller",
  "custom": "include family-friendly content only"
}
{
  "subject": "mystery",
  "format": "movies",
  "genre": "romance",
  "custom": "include family-friendly content only"
}

有了这个批量文件,我们可以使用批量 JSONL 文件中的各种输入条件运行两种变体。flow.dag.yaml在可视化编辑器中打开该文件,单击“批处理”(烧杯图标)启动批量数据加载过程,然后选择文件,如图 9.21 所示。在某些操作系统中,这可能显示为Local Data File.

With this bulk file, we can run both variants using the various input criteria in the bulk JSONL file. Open the flow.dag.yaml file in the visual editor, click Batch (the beaker icon) to start the bulk-data loading process, and select the file as shown in figure 9.21. For some operating systems, this may appear as Local Data File.

数字
图 9.21加载批量 JSONL 文件以在多个输入变体上运行流程

选择批量文件后,将打开一个新的 YAML 文档,并在文件底部添加一个“运行”链接,如图 9.22 所示。单击该链接即可批量运行输入。

After the bulk file is selected, a new YAML document will open with a Run link added at the bottom of the file, as shown in figure 9.22. Click the link to do the batch run of inputs.

数字
图 9.22运行批量输入

此时,会发生几件事。流程可视化编辑器将会出现,旁边还会打开一个日志文件,显示运行进度。在终端窗口中,您将看到各个工作进程正在生成和运行。

At this point, a few things will happen. The flow visual editor will appear, and beside that a log file will open, showing the progress of the run. In the terminal window, you’ll see the various worker processes spawning and running.

请耐心等待。即使只处理 10 个项目,批量运行也可能需要几分钟或几秒钟,具体取决于硬件、之前的调用等各种因素。等待运行完成,您将在终端中看到结果摘要。

Be patient. The batch run, even for 10 items, may take a few minutes or seconds, depending on various factors such as hardware, previous calls, and so on. Wait for the run to complete, and you’ll see a summary of results in the terminal.

您还可以通过打开提示流扩展并选择最后一次运行来查看运行结果,如图 9.23 所示。然后,您可以单击表格单元格来深入了解每次运行。此对话框中会显示大量信息,可帮助您排查流程和配置文件的问题。

You can also view the run results by opening the prompt flow extension and selecting the last run, as shown in figure 9.23. Then, you dig into each run by clicking the table cells. A lot of information is exposed in this dialog, which can help you troubleshoot flows and profiles.

数字
图 9.23初始运行可视化及批处理运行的检查

批处理运行期间会捕获大量信息,您可以通过可视化工具查看其中的大部分信息。点击终端窗口中的输出文件夹链接,即可查看更多信息。这将打开另一个 VS Code 会话,其中包含输出文件夹,方便您查看运行日志和其他详细信息。

A lot of information is captured during a batch run, and you can explore much of it through the visualizer. More information can be found by clicking the output folder link from the terminal window. This will open another session of VS Code with the output folder allowing you to review the run logs and other details.

现在我们已经完成了每个变体的批量运行,可以应用基准测试并评估两种提示的结果。下一节将使用新的流程来执行配置文件/提示评估。

Now that we’ve completed the batch run for each variant, we can apply grounding and evaluate the results of both prompts. The next section will use a new flow to perform the profile/prompt evaluation.

9.7.3创建接地评估流程

9.7.3 Creating an evaluation flow for grounding

如图 9.24 所示,在可视化编辑器中打开chapter_3\prompt_flow\evaluate_groundings\flow.dag.yaml。评估流程中没有 LLM 模块,只有 Python 代码块,用于运行评分并汇总分数。

Open chapter_3\prompt_flow\evaluate_groundings\flow.dag.yaml in the visual editor, as shown in figure 9.24. There are no LLM blocks in the evaluation flow—just Python code blocks that will run the scoring and then aggregate the scores.

数字
图 9.24查看evaluate_groundings用于地面推荐运行的流程

scoring现在我们可以查看`and`模块的代码aggregate,首先是清单 9.3 中的评分代码。该评分代码将每个标准的得分取平均值,得到一个平均分。该函数的输出是一个已处理推荐列表。

We can now look at the code for the scoring and aggregate blocks, starting with the scoring code in listing 9.3. This scoring code averages the score for each criterion into an average score. The output of the function is a list of processed recommendations.

清单 9.3 line_process.py
@工具
def line_process(recommendations: str):    #1
    输入 = 建议
    输出 = []
    对于输入中的 data_dict:                     #2
        总分 = 0
        得分计数 = 0

        for key, value in data_dict.items(): #2
                如果 key != "title":    #3
                    尝试:
                        total_score += float(value)
                        得分 += 1
                        data_dict[key] = float(value)    #4
                    除了:
                        经过

        平均分 = 总分 / 得分次数(如果得分次数 > 0,否则为 0)

        data_dict["avg_score"] = round(avg_score, 2)   #5
        output.append(data_dict)

    返回输出
@tool
def line_process(recommendations: str):     #1
    inputs = recommendations
    output = []
    for data_dict in inputs:                      #2
        total_score = 0
        score_count = 0

        for key, value in data_dict.items():     #2
                if key != "title":     #3
                    try:
                        total_score += float(value)
                        score_count += 1
                        data_dict[key] = float(value)     #4
                    except:
                        pass

        avg_score = total_score / score_count if score_count > 0 else 0

        data_dict["avg_score"] = round(avg_score, 2)    #5
        output.append(data_dict)

    return output
#1 将三条推荐结果输入到函数中。

#2 遍历每条推荐结果及其对应的标准。

#3 标题不是标准,因此忽略它。

#4 计算所有标准的得分总和,并将浮点值赋给键。

#5 将平均分作为推荐结果的基准分。

根据这些有理有据的建议,我们可以继续使用代码块汇总分数aggregate——代码块的代码aggregate如下表所示。

From the grounded recommendations, we can move on to aggregating the scores with the aggregate block—the code for the aggregate block is shown in the following listing.

列表 9.4 aggregate.py
@工具
def aggregate(processed_results: List[str]):
    items = [processed_results 中子列表的项]
               for item in sublist]    #1

    聚合 = {}

    对于 items 中的每个 item:
        对于 item.items() 中的每个键和值:
            如果 key == 'title':
                继续

            如果 isinstance(value, (float, int)):     #2
                如果键在聚合中:
                    aggregated[key] += value
                别的:
                    aggregated[key] = 值

    对于 aggregated.items() 中的每个键和值:     #3
        值 = 值 / len(items)
        log_metric(key=key, value=value)    #4
        aggregated[key] = 值

    返回汇总值
@tool
def aggregate(processed_results: List[str]):
    items = [item for sublist in processed_results 
               for item in sublist]     #1

    aggregated = {}

    for item in items:
        for key, value in item.items():
            if key == 'title':
                continue

            if isinstance(value, (float, int)):      #2
                if key in aggregated:
                    aggregated[key] += value
                else:
                    aggregated[key] = value

    for key, value in aggregated.items():      #3
        value = value / len(items)
        log_metric(key=key, value=value)     #4
        aggregated[key] = value

    return aggregated
#1 输入是列表的列表;将其扁平化为项目列表。

#2 检查值是否为数值,并累加每个准则键的分数。

#3 循环遍历聚合后的准则分数。

#4 将准则作为指标记录。

聚合结果将包括每个标准的汇总得分和平均得分。由于评估/基准测试流程是独立的,因此可以应用于我们执行的任何推荐运行。这将使我们能够使用批量运行结果来比较任何变体的结果。

The result of the aggregations will be a summary score for each criterion and the average score. Since the evaluation/grounding flow is separate, it can be run over any recommendation run we perform. This will allow us to use the batch run results for any variation to compare results.

我们可以通过在可视化编辑器中打开并点击“批处理”(烧杯图标)来运行接地流程flow.dag.yaml。然后,根据提示,选择一个现有运行,再选择要评估的运行,如图 9.25 所示。这将打开一个 YAML 文件,底部有“运行”链接,就像我们之前看到的那样。点击“运行”链接即可运行评估。

We can run the grounding flow by opening flow.dag.yaml in the visual editor and clicking Batch (beaker icon). Then, when prompted, we select an existing run and then select the run we want to evaluate, as shown in figure 9.25. This will open a YAML file with the Run link at the bottom, as we’ve seen before. Click the Run link to run the evaluation.

数字
图 9.25加载先前运行的数据以进行停用和评估

运行完成后,终端窗口会显示结果摘要。您可以点击输出链接在 VS Code 中打开文件夹并分析结果,但还有更好的方法来比较它们。

After the run is completed, you’ll see a summary of the results in the terminal window. You can click the output link to open the folder in VS Code and analyze the results, but there is a better way to compare them.

打开提示流扩展,将焦点移至“批处理运行历史记录”窗口,向下滚动到“运行对比”部分,如图 9.26 所示。选择要比较的运行(通常是位于顶部附近的运行),直到出现勾选标记。然后,右键单击该运行,并选择“可视化运行”选项。“批处理运行可视化”窗口随即打开,您将在顶部看到每个运行的指标。

Open the prompt flow extension, focus on the Batch Run History window, and scroll down to the Run against Run section, as shown in figure 9.26. Select the runs you want to compare—likely the ones near the top—so that the checkmark appears. Then, right-click the run, and select the Visualize Runs option. The Batch Run Visualization window opens, and you’ll see the metrics for each of the runs at the top.

数字
图 9.26可视化多次运行的指标并进行比较

现在我们可以看到用户提示(版本 0)和系统提示(版本 1)之间存在显著差异。如果您需要回顾提示/配置文件的外观,请参阅图 9.15。至此,应该很明显,将输入参数注入系统提示可以提供更好的推荐。

We can now see a significant difference between profile/prompt variation 0, the user prompt, and variation 1, the system prompt. Refer to figure 9.15 if you need a refresher on what the prompts/profiles look like. At this point, it should be evident that injecting the input parameters into the system prompt provides better recommendations.

现在您可以返回并尝试其他配置文件或其他选项,看看这对您的推荐有何影响。可能性几乎是无限的,但希望您能从中看出,对于构建代理配置文件和提示信息而言,这是一个多么优秀的工具提示流程。

You can now go back and try other profiles or other variant options to see what effect this has on your recommendations. The possibilities are virtually endless, but hopefully you can see what an excellent tool prompt flow will be for building agent profiles and prompts.

9.7.4练习

9.7.4 Exercises

通过以下练习来加深你对相关内容的理解:

Use the following exercises to improve your knowledge of the material:

  • 练习 1  — 为推荐流程创建新的提示变体(中级)
  • Exercise 1 —Create a New Prompt Variant for Recommender Flow (Intermediate)

目标 ——通过在提示流程中创建和测试新的提示变体来改进推荐结果。

Objective —Improve the recommendation results by creating and testing a new prompt variant in prompt flow.

任务:

Tasks:

    • 在提示流中为推荐流程创建一个新的提示变体。
    • 以批处理模式运行流程。
    • 评估结果,确定它们与原始提示相比是更好还是更差。
    • Create a new prompt variant for the recommender flow in prompt flow.
    • Run the flow in batch mode.
    • Evaluate the results to determine if they are better or worse compared to the original prompt.
  • 练习 2  — 向评分标准添加自定义字段并进行评估(中级)
  • Exercise 2 —Add a Custom Field to the Rubric and Evaluate (Intermediate)

目标 ——通过在评分标准中加入自定义字段并更新评估流程来增强评估标准。

Objective —Enhance the evaluation criteria by incorporating a custom field into the rubric and updating the evaluation flow.

任务:

Tasks:

    • 将自定义字段作为新标准添加到评分标准中。
    • 更新评估流程,使其能够根据新标准进行评分。
    • 评估结果,并分析新标准对评估的影响。
    • Add the custom field as a new criterion to the rubric.
    • Update the evaluation flow to score the new criterion.
    • Evaluate the results, and analyze the effect of the new criterion on the evaluation.
  • 练习 3—— 制定新的用例和评估标准(高级)
  • Exercise 3 —Develop a New Use Case and Evaluation Rubric (Advanced)

目标 ——通过开发新的用例和创建评估标准来扩展快速工程的应用。

Objective —Expand the application of prompt engineering by developing a new use case and creating an evaluation rubric.

任务:

Tasks:

    • 除了上述建议之外,开发一个新的应用场景。
    • 为新的用例构建提示。
    • 制定评估新提示的评分标准。
    • 更新或修改评估流程,将新用例的结果与现有用例的结果进行汇总和比较。
    • Develop a new use case aside from the recommendation.
    • Build the prompt for the new use case.
    • Create a rubric for evaluating the new prompt.
    • Update or alter the evaluation flow to aggregate and compare the results of the new use case with existing ones.
  • 练习 4  — 使用 LM Studio 评估其他 LLM(中级)
  • Exercise 4 —Evaluate Other LLMs Using LM Studio (Intermediate)

目标 ——通过在本地服务器上运行 LM Studio 来评估不同开源 LLM 的性能。

Objective —Assess the performance of different open source LLMs by hosting a local server with LM Studio.

任务:

Tasks:

    • 使用 LM Studio 托管本地服务器来评估 LLM。
    • 评估其他开源LLM。
    • 如果需要服务器设置和评估方面的帮助,请参阅第 2 章。
    • Use LM Studio to host a local server for evaluating LLMs.
    • Evaluate other open source LLMs.
    • Consult chapter 2 if assistance is needed for setting up the server and performing the evaluations.
  • 练习 5—— 使用提示流程构建和评估提示(中级)
  • Exercise 5 —Build and Evaluate Prompts Using Prompt Flow (Intermediate)

目标 ——应用提示工程策略,利用提示流程构建和评估新的提示或配置文件。

Objective —Apply prompt engineering strategies to build and evaluate new prompts or profiles using prompt flow.

任务:

Tasks:

    • 使用提示流程构建新的评估提示或配置文件。
    • 应用第 2 章中的“编写清晰的说明”提示工程策略。
    • 使用提示流程评估提示和配置文件。
    • 如果需要复习,请参阅第 2 章了解战术和实施细节。
    • Build new prompts or profiles for evaluation using prompt flow.
    • Apply the Write Clear Instructions prompt engineering strategy from chapter 2.
    • Evaluate the prompts and profiles using prompt flow.
    • Refer to chapter 2 for tactics and implementation details if a refresher is needed.

概括

Summary

  • 代理配置文件由几个其他组件提示组成,这些提示可以驱动诸如操作/工具、知识、记忆、评估、推理、反馈和规划等功能。
  • An agent profile consists of several other component prompts that can drive functions such as actions/tools, knowledge, memory, evaluation, reasoning, feedback, and planning.
  • 提示流程可用于评估代理的组件提示。
  • Prompt flow can be used to evaluate an agent’s component prompts.
  • 系统提示工程是一个迭代过程,需要评估提示和代理配置文件。
  • Systemic prompt engineering is an iterative process evaluating a prompt and agent profile.
  • 系统性测试变更策略描述了迭代和评估提示的过程,而系统提示工程则实现了这一策略。
  • The Test Changes Systematically strategy describes iterating and evaluating prompts, and system prompt engineering implements this strategy.
  • 智能体配置文件和提示工程有很多相似之处。我们将智能体配置文件定义为提示工程要素的组合,这些要素能够引导和帮助智能体完成任务。
  • Agent profiles and prompt engineering have many similarities. We define an agent profile as the combination of prompt engineering elements that guide and help an agent through its task.
  • Prompt flow 是微软推出的一款开源工具,它提供了多种功能,用于开发和评估配置文件和提示。
  • Prompt flow is an open source tool from Microsoft that provides several features for developing and evaluating profiles and prompts.
  • 提示流中的 LLM 连接支持其他参数,包括温度、停止令牌、最大令牌和其他高级参数。
  • An LLM connection in prompt flow supports additional parameters, including temperature, stop token, max tokens, and other advanced parameters.
  • LLM 块支持提示符和配置文件变体,从而可以评估提示符/配置文件或其他连接参数的更改。
  • LLM blocks support prompt and profile variants, which allow for evaluating changes to the prompt/profile or other connection parameters.
  • 应用于法学硕士(LLM)申请文书题目的评分标准,是指申请文书/个人简介必须满足的各项标准和要求,才能被认可。认可过程是对评分标准进行评分和评估。
  • A rubric applied to an LLM prompt is the criteria and standards a prompt/profile must fulfill to be grounded. Grounding is the scoring and evaluation of a rubric.
  • 提示流程支持以单次运行或批量运行的方式运行多个变体。
  • Prompt flow supports running multiple variations as single runs or batch runs.
  • 在提示流程中,生成流程之后会运行评估流程,以对结果进行评分和汇总。“可视化运行”选项可以比较多次运行中根据评分标准汇总的评分结果。
  • In prompt flow, an evaluation flow is run after a generative flow to score and aggregate the results. The Visualize Runs option can compare the aggregated criteria from scoring the rubric across multiple runs.

10 智能体推理与评估

10 Agent reasoning and evaluation

本章内容

This chapter covers

  • 利用各种提示工程技术来扩展大型语言模型功能
  • Using various prompt engineering techniques to extend large language model functions
  • 利用能够激发推理能力的快速工程技术来调动大型语言模型
  • Engaging large language models with prompt engineering techniques that engage reasoning
  • 运用评估提示来缩小范围并确定未知问题的解决方案
  • Employing an evaluation prompt to narrow and identify the solution to an unknown problem

现在我们已经考察了构成智能体语义记忆组件的记忆和检索模式,接下来我们可以看看智能体中最后一个也是最重要的组件:规划。规划涵盖诸多方面,包括推理、理解、评估和反馈。

Now that we’ve examined the patterns of memory and retrieval that define the semantic memory component in agents, we can take a look at the last and most instrumental component in agents: planning. Planning encompasses many facets, from reasoning, understanding, and evaluation to feedback.

为了探究如何引导逻辑逻辑模型(LLM)进行推理、理解和规划,我们将演示如何通过提示工程来激发推理能力,并将其扩展到规划领域。语义内核(SK)提供的规划解决方案涵盖多种规划形式。本章最后,我们将把自适应反馈集成到一个新的规划器中。

To explore how LLMs can be prompted to reason, understand, and plan, we’ll demonstrate how to engage reasoning through prompt engineering and then expand that to planning. The planning solution provided by the Semantic Kernel (SK) encompasses multiple planning forms. We’ll finish the chapter by incorporating adaptive feedback into a new planner.

图 10.1 展示了本章将要介绍的高级提示工程策略,以及它们与我们将要介绍的各种技术之间的关系。图中所示的每一种方法都将在本章中进行探讨,从左上角所示的解决方案/直接提示的基础知识,到右下角所示的自洽性和思维树 (ToT) 提示。

Figure 10.1 demonstrates the high-level prompt engineering strategies we’ll cover in this chapter and how they relate to the various techniques we’ll cover. Each of the methods showcased in the figure will be explored in this chapter, from the basics of solution/direct prompting, shown in the top-left corner, to self-consistency and tree of thought (ToT) prompting, in the bottom right.

数字
图 10.1两种规划提示工程策略如何与各种技术相一致

10.1理解直接解决方案提示

10.1 Understanding direct solution prompting

直接提示解决方案通常是用户在向学习型学习者 (LLM) 提问或解决特定问题时采用的第一种提示方式。对于任何 LLM 的使用,这些技巧可能看起来显而易见,但值得回顾,以便建立思考和规划的基础。在下一节中,我们将从头开始,提出问题并期待答案。

Direct solution prompting is generally the first form of prompt engineering that users employ when asking LLMs questions or solving a particular problem. Given any LLM use, these techniques may seem apparent, but they are worth reviewing to establish the foundation of thought and planning. In the next section, we’ll start from the beginning, asking questions and expecting answers.

10.1.1问答提示

10.1.1 Question-and-answer prompting

在本章的练习中,我们将运用提示流程图来构建和评估各种技巧。(我们已在第9章详细介绍过这个工具,如果您需要复习,请参阅该章节。)提示流程图是理解这些技巧如何运作以及探索规划和推理过程的绝佳工具。

For the exercises in this chapter, we’ll employ prompt flow to build and evaluate the various techniques. (We already extensively covered this tool in chapter 9, so refer to that chapter if you need a review.) Prompt flow is an excellent tool for understanding how these techniques work and exploring the flow of the planning and reasoning process.

打开 Visual Studio Code (VS Code) 并切换到chapter 10源文件夹。为该文件夹创建一个新的虚拟环境,然后安装该requirements.txt文件。如果您需要帮助设置章节的 Python 环境,请参阅附录 B。

Open Visual Studio Code (VS Code) to the chapter 10 source folder. Create a new virtual environment for the folder, and install the requirements.txt file. If you need help setting up a chapter’s Python environment, refer to appendix B.

我们将查看prompt_flow/question-answering-prompting文件夹中的第一个流程。如图 10.2 所示,在可视化编辑器中打开该flow.dag.yaml文件。在右侧,您将看到组件流程。顶部是question_answerLLM 提示,后面是两个Embedding组件,最后是用于执行名为 的评估的 LLM 提示evaluate

We’ll look at the first flow in the prompt_flow/question-answering-prompting folder. Open the flow.dag.yaml file in the visual editor, as shown in figure 10.2. On the right side, you’ll see the flow of components. At the top is the question_answer LLM prompt, followed by two Embedding components and a final LLM prompt to do the evaluation called evaluate.

数字
图 10.2flow.dag.yaml文件在可视化编辑器中打开,突出显示了流程的各个组成部分。

清单 10.1 中的分解图使用类似 YAML 缩减伪代码的方式,更详细地展示了流程的结构和组成部分。您还可以看到各个组件的输入和输出,以及运行该流程的示例输出。

The breakdown in listing 10.1 shows the structure and components of the flow in more detail using a sort of YAML-shortened pseudocode. You can also see the input and outputs to the various components and a sample output from running the flow.

列表 10.1 question-answer-prompting流程
   输入:
        背景:要问的问题的内容
        问题:针对内容提出的问题
        预期答案

   法学硕士:问答(提问时使用的提示语)
        输入:
               背景和问题
        输出:
               对这个问题的预测/答案

   嵌入:使用 LLM 嵌入模型创建嵌入。
文本的表示

     Embedding_predicted:嵌入问答 LLM 的输出。
     Embedding_expected:嵌入预期答案的输出

   Python:评估(用于衡量嵌入相似度的 Python 代码)
     输入:
            嵌入预测输出
            嵌入预期输出
     输出:
            预测值与预期值之间的相似度得分

   输出:
        上下文:-> 输入.上下文
        问题:-> 输入问题
     预期结果:-> input.expected
     预测结果:-> output.question_answer
     评估分数:输出.评估

### 示例输出
{
    “背景”:“回到未来(1985)……”
    "evaluation_score": 0.9567478002354606,
    “预期”:“马蒂穿越回了30年前。”
    “预测”:“马蒂从1985年穿越回30年前的1955年。”
在电影《回到未来》中……”
    问题:“在电影中,马蒂究竟穿越回了多久以前?”
回到未来(1985)
}
   Inputs:
        context  : the content to ask the question about
        question : question asked specific to the content
        expected : the expected answer

   LLM: Question-Answer (the prompt used to ask the question)
        inputs:
               context and question
        outputs: 
               the prediction/answer to the question

   Embeddings: uses an LLM embedding model to create the embedding 
representation of the text

     Embedding_predicted: embeds the output of the Question-Answer LLM
     Embedding_expected: embeds the output of the expected answer

   Python: Evaluation (Python code to measure embedding similarity)
     Inputs:
            Embedding_predicted output
            Embedding_expected output
     Outputs: 
            the similarity score between predicted and expected

   Outputs:
        context: -> input.context
        question: -> input.question
     expected: -> input.expected
     predicted: -> output.question_answer
     evaluation_score: output.evaluation

### Example Output
{
    "context": "Back to the Future (1985)…",
    "evaluation_score": 0.9567478002354606,
    "expected": "Marty traveled back in time 30 years.",
    "predicted": "Marty traveled back in time 30 years from 1985 to 1955 
in the movie \"Back to the Future.\"",
    "question": "How far did Marty travel back in time in the movie 
Back to the Future (1985)"
}

运行此流程之前,请确保您的 LLM 模块已正确配置。这可能需要您建立与所选 LLM 的连接。如果您需要了解如何完成此操作,请参阅第 9 章。Embedding如果您未使用 OpenAI,则需要使用您的连接配置 LLM 和模块。

Before running this flow, make sure your LLM block is configured correctly. This may require you to set up a connection to your chosen LLM. Again, refer to chapter 9 if you need a review on how to complete this. You’ll need to configure the LLM and Embedding blocks with your connection if you’re not using OpenAI.

配置好 LLM 连接后,点击可视化编辑器中的“播放”按钮或使用 YAML 编辑器窗口中的“测试”(Shift-F5)链接来运行流程。如果所有连接和配置都正确,您应该会看到类似清单 10.1 中的输出。

After configuring your LLM connection, run the flow by clicking the Play button from the visual editor or using the Test (Shift-F5) link in the YAML editor window. If everything is connected and configured correctly, you should see output like that in listing 10.1.

在 VS Code 中打开question_answer.jinja2文件,如清单 10.2 所示。该清单展示了基本的问答式提示。在这种提示中,系统消息会描述基本规则并提供回答问题的上下文。在第 4 章中,我们探讨了检索增强生成 (RAG) 模式,而此提示遵循类似的模式。

Open the question_answer.jinja2 file in VS Code, as shown in listing 10.2. This listing shows the basic question-and-answer-style prompt. In this style of prompt, the system message describes the basic rules and provides the context to answer the question. In chapter 4, we explored the retrieval augmented generation (RAG) pattern, and this prompt follows a similar pattern.

清单 10.2 question_answer.jinja2
系统:
请根据以下上下文回答用户的问题。请保留答案。
简明扼要。如果不确定答案,请回复“不确定”。
回答。

上下文:{{context}}    #1

用户:
问题:{{question}}    #2
system:
Answer the users question based on the context below. Keep the answer 
short and concise. Respond "Unsure about answer" if not sure about the 
answer.

Context: {{context}}     #1

user:
Question: {{question}}     #2
#1 替换为LLM课程应该回答的问题的内容。

#2 替换为问题本身。

本练习展示了使用 LLM 向用户提问有关内容的简单方法。然后,使用相似度匹配分数评估问题的回答。从清单 10.1 的输出可以看出,LLM 在回答有关上下文的问题方面表现出色。下一节,我们将探讨一种使用直接提示的类似技术。

This exercise shows the simple method of using an LLM to ask questions about a piece of content. Then, the question response is evaluated using a similarity matching score. We can see from the output in listing 10.1 that the LLM does a good job of answering a question about the context. In the next section, we’ll explore a similar technique that uses direct prompting.

10.1.2实施少样本提示

10.1.2 Implementing few-shot prompting

少样本提示类似于问答式提示,但提示内容更侧重于提供少量示例,而非事实或背景信息。这使得学习学习模型能够适应之前未曾见过的模式或内容。虽然这种方法听起来像是问答式提示,但其具体实现方式却截然不同,而且效果可能非常显著。

Few-shot prompting is like question-and-answer prompting, but the makeup of the prompt is more about providing a few examples than about facts or context. This allows the LLM to bend to patterns or content not previously seen. While this approach sounds like question and answer, the implementation is quite different, and the results can be powerful.

在 VS Code 和可视化编辑器中打开prompt_flow/few-shot-prompting/flow.dag.yaml。大部分流程与前面图 10.2 所示的流程类似,不同之处在清单 10.3 中突出显示,清单 10.3 显示了 YAML 伪代码表示。此流程与之前流程的主要区别在于输入和 LLM 提示。

Open prompt_flow/few-shot-prompting/flow.dag.yaml in VS Code and the visual editor. Most of the flow looks like the one pictured earlier in figure 10.2, and the differences are highlighted in listing 10.3, which shows a YAML pseudocode representation. The main differences between this and the previous flow are the inputs and LLM prompt.

列表 10.3 few-shot-prompting流程
   输入:
       语句:介绍上下文,然后请求输出
       预期答案:对该陈述的预期答案
  LLM:few_shot(用于提问的提示)
       输入:语句
       输出:对该陈述的预测/答案

   嵌入:使用 LLM 嵌入模型创建嵌入。
文本的表示

        Embedding_predicted:嵌入 few_shot LLM 的输出。
        Embedding_expected:嵌入预期答案的输出

   Python:评估(用于衡量嵌入相似度的 Python 代码)
        输入:
               嵌入预测输出
               嵌入预期输出
        输出:预测值与预期值之间的相似度得分

输出:
        语句:-> 输入语句
        预期结果:-> input.expected
        预测结果:-> output.few_shot
        评估分数:输出.评估

### 示例输出
{
    "evaluation_score": 0.906647282920417,    #1
    “预期”:“我们吃了日落饭,看了日落。”
    “预测”:“长途跋涉之后,我们坐在湖边。”
随着天空变换,我们享受了一段宁静的日落。
绚丽的橙色和粉色。    #2
    “声明”:“sunner 是我们在加拿大吃的一种食物。”
日落时分,请用这个词造句。    #3
}
   Inputs:
       statement  : introduces the context and then asks for output
       expected : the expected answer to the statement
   LLM: few_shot (the prompt used to ask the question)
       inputs:statement
       outputs: the prediction/answer to the statement

   Embeddings: uses an LLM embedding model to create the embedding 
representation of the text

        Embedding_predicted: embeds the output of the few_shot LLM
        Embedding_expected: embeds the output of the expected answer

   Python: Evaluation (Python code to measure embedding similarity)
        Inputs:
               Embedding_predicted output
               Embedding_expected output
        Outputs: the similarity score between predicted and expected

Outputs:
        statement: -> input.statement
        expected: -> input.expected
        predicted: -> output.few_shot
        evaluation_score: output.evaluation

### Example Output
{
    "evaluation_score": 0.906647282920417,     #1
    "expected": "We ate sunner and watched the setting sun.",
    "predicted": "After a long hike, we sat by the lake 
and enjoyed a peaceful sunner as the sky turned 
brilliant shades of orange and pink.",     #2
    "statement": "A sunner is a meal we eat in Cananda 
at sunset, please use the word in a sentence"     #3
}
#1 评估分数代表预期值与预测值之间的相似度。

#2 在句子中使用 sunner。

#3 这是一个错误陈述,但目的是让法学硕士(LLM)像使用真实词语​​一样使用这个词。

按 Shift-F5 或点击可视化编辑器中的“播放/测试”按钮运行流程。您应该看到类似清单 10.3 的输出,其中 LLM 在给定初始语句的情况下,正确地在句子中使用了sunner一词(一个虚构的词)。

Run the flow by pressing Shift-F5 or clicking the Play/Test button from the visual editor. You should see output like listing 10.3 where the LLM has used the word sunner (a made-up term) correctly in a sentence given the initial statement.

这项练习展示了如何利用提示改变语言学习模型(LLM)的行为,使其与已学知识相悖。我们正在改变LLM对“正确”概念的理解。此外,我们还会利用这种改变后的视角来引导LLM使用一个自造词。

This exercise demonstrates the ability to use a prompt to alter the behavior of the LLM to be contrary to what it has learned. We’re changing what the LLM understands to be accurate. Furthermore, we then use that modified perspective to elicit the use of a made-up word.

在 VS Code 中打开few_shot.jinja2示例 10.4 所示的提示符。该示例演示了如何设置一个简单的角色——一位古怪的词典编纂者,并提供他之前定义和使用过的单词示例。提示符的基础功能允许语言学习模型 (LLM) 扩展示例,并使用其他单词生成类似的结果。

Open the few_shot.jinja2 prompt in VS Code, shown in listing 10.4. This listing demonstrates setting up a simple persona, that of an eccentric dictionary maker, and then providing examples of words it has defined and used before. The base of the prompt allows for the LLM to extend the examples and produce similar results using other words.

清单 10.4 few_shot.jinja2
系统:
你是一位特立独行的词典编纂者。你将被要求……

用这个词造句。
以下示例演示了如何使用……来构建句子。
这个词。
“whatpu”是一种原产于坦桑尼亚的小型毛茸茸动物。
包含单词 whatpu 的句子示例是:    #1
我们在非洲旅行时,看到了这些非常可爱的小家伙。
“farduddle”指的是快速地上下跳跃。例如……
使用了单词 farduddle 的句子是:
我兴奋得开始胡言乱语。    #2

请仅返回用户请求的句子。  #3

用户:
{{陈述}}   #4
system:
You are an eccentric word dictionary maker. You will be asked to 

construct a sentence using the word.
The following are examples that demonstrate how to craft a sentence using 
the word.
A "whatpu" is a small, furry animal native to Tanzania. 
An example of a sentence that uses the word whatpu is:     #1
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a 
sentence that uses the word farduddle is:
I was so excited that I started to farduddle.     #2

Please only return the sentence requested by the user.   #3

user:
{{statement}}    #4
#1 演示定义一个自造词并在句子中使用它的示例

#2 演示另一个示例

#3 防止 LLM 输出额外信息的规则

#4 输入语句定义了一个新词并要求使用它。

你可能会说我们这是在强迫低效模型产生幻觉,但这种技术是行为改变的基础。它允许我们构建提示,引导低效模型做出与其所学完全相反的行为。这种提示基础也为其他形式的行为改变奠定了基础。从改变低效模型的感知和背景,我们将在下一节展示一个直接解决方案的最终示例。

You may say we’re forcing the LLM to hallucinate here, but this technique is the basis for modifying behavior. It allows prompts to be constructed to guide an LLM to do everything contrary to what it learned. This foundation of prompting also establishes techniques for other forms of altered behavior. From the ability to alter the perception and background of an LLM, we’ll move on to demonstrate a final example of a direct solution in the next section.

10.1.3利用零样本提示提取普遍规律

10.1.3 Extracting generalities with zero-shot prompting

零样本提示或学习是指以某种方式生成提示,使学习学习模型(LLM)能够进行泛化。这种泛化能力蕴含在学习学习模型内部,并通过零样本提示来展现。在零样本提示中,不提供任何示例,而是提供一套指导原则或规则来引导学习学习模型。

Zero-shot prompting or learning is the ability to generate a prompt in such a manner that allows the LLM to generalize. This generalization is embedded within the LLM and demonstrated through zero-shot prompting, where no examples are given, but instead a set of guidelines or rules are given to guide the LLM.

运用这种技术简单有效,能够引导语言学习模型(LLM)根据其内部知识生成回复,而无需其他上下文信息。这是一种巧妙而强大的技术,可以将LLM的知识应用到其他应用中。该技术与其他提示策略相结合,已被证明能够有效地替代其他语言分类模型——例如,识别文本中情感或情绪的模型。

Employing this technique is simple and works well to guide the LLM to generate replies given its internal knowledge and no other contexts. It’s a subtle yet powerful technique that applies the knowledge of the LLM to other applications. This technique, combined with other prompting strategies, is proving effective at replacing other language classification models—models that identify the emotion or sentiment in text, for example.

在 VS Code 的提示符流程可视化编辑器中打开prompt_flow/zero-shot-prompting/flow.dag.yaml。此流程与前面图 10.1 所示的流程几乎完全相同,但在实现上略有不同,如清单 10.5 所示。

Open prompt_flow/zero-shot-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. This flow is again almost identical to that shown earlier in figure 10.1 but differs slightly in implementation, as shown in listing 10.5.

列表 10.5 zero-shot-prompting流量
   输入:
        声明:待分类的声明
        预期:该陈述的预期分类

    LLM:zero_shot(用于分类的提示)
        输入:语句
        输出:根据语句预测的类别

    嵌入:使用 LLM 嵌入模型创建嵌入。
文本的表示

    Embedding_predicted:嵌入零样本 LLM 的输出。
    Embedding_expected:嵌入预期答案的输出

    Python:评估(用于衡量嵌入相似度的 Python 代码)
        输入:
               嵌入预测输出
             嵌入预期输出
          输出:预测值与预期值之间的相似度得分

   输出:
        语句:-> 输入语句
        预期结果:-> input.expected
        预测结果:-> output.few_shot
        评估分数:输出.评估

   ### 示例输出
{
       "evaluation_score": 1,    #1
       “预期”: “中性”,
       “预测”:“中立”,
       声明:“我认为这次假期还可以。”    #2
   }
   Inputs:
        statement  : the statement to be classified
        expected : the expected classification of the statement

    LLM: zero_shot (the prompt used to classify)
        inputs: statement
        outputs: the predicted class given the statement

    Embeddings: uses an LLM embedding model to create the embedding 
representation of the text

    Embedding_predicted: embeds the output of the zero_shot LLM
    Embedding_expected: embeds the output of the expected answer

    Python: Evaluation (Python code to measure embedding similarity)
        Inputs:
               Embedding_predicted output
             Embedding_expected output
          Outputs: the similarity score between predicted and expected

   Outputs:
        statement: -> input.statement
        expected: -> input.expected
        predicted: -> output.few_shot
        evaluation_score: output.evaluation

   ### Example Output
{
       "evaluation_score": 1,     #1
       "expected": "neutral",
       "predicted": "neutral",
       "statement": "I think the vacation is okay. "     #2
   }
#1 显示完美的评估分数 1.0

#2 我们要求 LLM 分类的语句

在 VS Code 的提示符流程可视化编辑器中按 Shift-F5 运行流程。您应该会看到类似于清单 10.5 中所示的输出。

Run the flow by pressing Shift-F5 within the VS Code prompt flow visual editor. You should see output similar to that shown in listing 10.5.

现在打开zero_shot.jinja2如清单 10.6 所示的提示。该提示很简单,没有使用任何示例来提取文本的情感倾向。尤其值得注意的是,提示甚至没有提及“情感”这个词组,但语言学习模型似乎理解了其意图。

Now open the zero_shot.jinja2 prompt as shown in listing 10.6. The prompt is simple and uses no examples to extract the sentiment from the text. What is especially interesting to note is that the prompt doesn’t even mention the phrase sentiment, and the LLM seems to understand the intent.

清单 10.6 zero_shot.jinja2
系统:
将文本分类为中性、负面或正面。
只返回结果,不做其他任何事。    #1

用户:
{{陈述}}    #2
system:
Classify the text into neutral, negative or positive. 
Return on the result and nothing else.     #1

user:
{{statement}}     #2
#1 提供分类操作的基本指导

#2 待分类文本的描述

零样本提示工程是指利用学习型模型(LLM)基于其训练材料进行广泛泛化的能力。本练习展示了如何将LLM中的知识应用于其他任务。LLM的自我情境化和知识应用能力可以超越其训练范围。下一节,我们将进一步探讨LLM的推理能力,从而拓展这一概念。

Zero-shot prompt engineering is about using the ability of the LLM to generalize broadly based on its training material. This exercise demonstrates how knowledge within the LLM can be put to work for other tasks. The LLM’s ability to self-contextualize and apply knowledge can extend beyond its training. In the next section, we extend this concept further by looking at how LLMs can reason.

10.2提示工程中的推理

10.2 Reasoning in prompt engineering

像 ChatGPT 这样的语言学习模型(LLM)被开发出来是为了作为聊天自动完成模型,用户将文本内容输入模型,模型会根据输入内容做出相应的响应。LLM 从未被训练成能够推理、计划、思考或进行思考。

LLMs like ChatGPT were developed to function as chat completion models, where text content is fed into the model, whose responses align with completing that request. LLMs were never trained to reason, plan, think, or have thoughts.

然而,正如我们在上一节的例子中所展示的那样,LLM模型可以被引导去提取其普遍规律,并扩展到其初始设计之外。虽然LLM模型并非设计用于推理,但输入模型的训练材料能够帮助模型理解推理、规划和思考。因此,从这个意义上讲,LLM模型理解推理的本质,并能够运用推理的概念。

However, much like we demonstrated with the examples in the previous section, LLMs can be prompted to extract their generalities and be extended beyond their initial design. While an LLM isn’t designed to reason, the training material fed into the model provides an understanding of reasoning, planning, and thought. Therefore, by extension, an LLM understands what reasoning is and can employ the concept of reasoning.

我们将探讨另一组提示工程技术,这些技术能够模拟或模拟推理行为,从而展现这种推理能力。通常,在评估推理能力的应用时,我们会让LLM解决一些它原本设计之外的挑战性问题。这类问题通常基于逻辑、数学和文字题。

We’ll look at another set of prompt engineering techniques that allow or mimic reasoning behavior to demonstrate this reasoning ability. Typically, when evaluating the application of reasoning, we look to having the LLM solve challenging problems it wasn’t designed to solve. A good source of such is based on logic, math, and word problems.

以时间旅行为主题,还有什么比理解时间旅行更值得解决的独特问题呢?图 10.3 展示了一个极具挑战性的时间旅行问题示例。我们的目标是掌握如何引导逻辑推理模型(LLM)正确解决该问题。

Using the time travel theme, what class of unique problems could be better to solve than understanding time travel? Figure 10.3 depicts one example of a uniquely challenging time travel problem. Our goal is to acquire the ability to prompt the LLM in a manner that allows it to solve the problem correctly.

数字
图 10.3我们计划使用具有推理和规划功能的 LLM 来解决的时间旅行问题的复杂性

时间旅行问题是思维练习,但解决起来可能比想象中要困难得多。图 10.3 中的例子就很难解决。它虽然是法学硕士,但它出错的地方可能会让你感到惊讶。下一节将运用推理能力来解决这些独特的问题。

Time travel problems are thought exercises that can be deceptively difficult to solve. The example in figure 10.3 is complicated to solve for an LLM, but the part it gets wrong may surprise you. The next section will use reasoning in prompts to solve these unique problems.

10.2.1思路链提示

10.2.1 Chain of thought prompting

思维链(CoT)提示是一种提示工程技术,它利用一次性或少数几个示例来描述推理过程以及实现预期目标的步骤。通过演示推理过程,LLM 可以概括这一原理,并针对类似的问题和目标进行推理。虽然 LLM 的训练目标并非推理,但我们可以通过提示工程来引导模型进行推理。

Chain of thought (CoT)prompting is a prompt engineering technique that employs the one-shot or few-shot examples that describe the reasoning and the steps to accomplish a desired goal. Through the demonstration of reasoning, the LLM can generalize this principle and reason through similar problems and goals. While the LLM isn’t trained with the goal of reasoning, we can elicit the model to reason, using prompt engineering.

在 VS Code 提示流程可视化编辑器中打开prompt_flow/chain-of-thought-prompting/flow.dag.yaml。如图 10.4 所示,此流程的元素很简单。它仅包含两个 LLM 模块,首先使用 CoT 提示来解决一个复杂问题;然后,第二个 LLM 提示评估答案。

Open prompt_flow/chain-of-thought-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. The elements of this flow are simple, as shown in figure 10.4. With only two LLM blocks, the flow first uses a CoT prompt to solve a complex question; then, the second LLM prompt evaluates the answer.

数字
图 10.4 CoT 的流程

清单 10.7 展示了更详细描述流程块及其输入/输出的 YAML 伪代码。本例中的默认问题陈述与图 10.3 中的不同。

Listing 10.7 shows the YAML pseudocode that describes the blocks and the inputs/outputs of the flow in more detail. The default problem statement in this example isn’t the same as in figure 10.3.

列表 10.7 chain-of-thought-prompting流程
   输入:
        陈述:待解决的问题的陈述
        预期:问题的预期解决方案

  LLM:cot(用于解决问题的提示)
        输入:语句
        输出:根据问题陈述预测的答案

   LLM:evaluate_answer(用于评估解决方案的提示)
        输入:
               语句:-> 输入语句
               预期结果:-> input.expected
               预测结果:-> output.cot

        输出结果:问题解答程度的评分。


   输出:
        语句:-> 输入语句
        预期结果:-> input.expected
        预测结果:-> output.cot
        评估分数:output.evaluate_answer

   ### 示例输出
{
    "evaluation_score": "0.5",    #1
    “预期”:在最后一次跳跃之后,马克斯发现自己
他出生于1980年,今年75岁。    #2
    “预测”:Max 的起始年份为 2300 年,
以40年为单位倒叙,每次停留5年。
每个阶段的行程如下:
\n\n- 从 2300 年到 2260 年:最大年龄为 25 + 5 = 30 岁。
- 从 2260 年到 2220 年:最大年龄为 30 + 5 = 35 岁。
- 从 2220 年到 2180 年:最大年龄为 35 + 5 = 40 岁。
\n- 从 2180 年到 2140 年:最大年龄为 40 + 5 = 45 岁。
- 从 2140 年到 2100 年:最大年龄为 45 + 5 = 50 岁。
- 从 2100 年到 2060 年:最大年龄为 50 + 5 = 55 岁。
\n- 从 2060 年到 2020 年:最大年龄为 55 + 5 = 60 岁。
\n- 从 2020 年到 1980 年:Max 的年龄为 60 + 5 = 65 岁。
\n- 从 1940 年到 1980 年:Max 的年龄是 65 + 5 = 70 岁。
\n- 1940 年至 1900 年:最大值为 70 + 5 英寸    #3
}
   Inputs:
        statement  : the statement problem to be solved
        expected : the expected solution to the problem

   LLM: cot (the prompt used to solve the problem)
        inputs: statement
        outputs: the predicted answer given the problem statement

   LLM: evaluate_answer (the prompt used to evaluate the solution)
        inputs:
               statement: -> input.statement
               expected: -> input.expected
               predicted: -> output.cot

        outputs: a score of how well the problem was answered


   Outputs:
        statement: -> input.statement
        expected: -> input.expected
        predicted: -> output.cot
        evaluation_score: output.evaluate_answer

   ### Example Output
{
    "evaluation_score": "0.5",     #1
    "expected": "After the final jump, Max finds himself 
in the year 1980 and he is 75 years old.",     #2
    "predicted": " Max starts in the year 2300 and 
travels backward in 40-year increments, spending 5 years 
in each period. The journeys will be as follows:
\n\n- From 2300 to 2260: Max is 25 + 5 = 30 years old.
\n- From 2260 to 2220: Max is 30 + 5 = 35 years old.
\n- From 2220 to 2180: Max is 35 + 5 = 40 years old.
\n- From 2180 to 2140: Max is 40 + 5 = 45 years old.
\n- From 2140 to 2100: Max is 45 + 5 = 50 years old.
\n- From 2100 to 2060: Max is 50 + 5 = 55 years old.
\n- From 2060 to 2020: Max is 55 + 5 = 60 years old.
\n- From 2020 to 1980: Max is 60 + 5 = 65 years old.
\n- From 1980 to 1940: Max is 65 + 5 = 70 years old.
\n- From 1940 to 1900: Max is 70 + 5"     #3
}
#1 给定解决方案的评估分数

#2 问题的预期答案

#3 预测答案显示了推理步骤和输出。

仔细研究输入数据并检查问题描述;尝试自行评估问题。然后,按 Shift-F5 运行流程。你应该看到与清单 10.7 中类似的输出。

Dig into the inputs and check the problem statement; try to evaluate the problem yourself. Then, run the flow by pressing Shift-F5. You should see output similar to that shown in listing 10.7.

打开cot.jinja2如清单 10.8 所示的提示文件。该提示文件提供了一些时间旅行问题的示例,以及经过深思熟虑和推理的解决方案。向学习逻辑模型 (LLM) 展示解决问题的步骤,有助于理解其推理机制。

Open the cot.jinja2 prompt file as shown in listing 10.8. This prompt gives a few examples of time travel problems and then the thought-out and reasoned solution. The process of showing the LLM the steps to complete the problem provides the reasoning mechanism.

列表 10.8 cot.jinja2
系统:
在一部穿越时空的电影中,莎拉穿越回过去……
阻止历史性事件的发生。她来了。
活动开始前两天。经过一天的准备,
她试图改变这件事,但意识到自己已经……
实际上她提前了两年到达,而不是两天。然后
决定等待并沉浸在过去,直到事件发生。
原始日期。莎拉在过去待了多少天。
在活动当天之前?”    #1

思路链:    #2

    初始假设:莎拉认为她会在活动开始前两天到达。
    准备时间:准备工作耗时1天。
    意识到错误:莎拉意识到她实际上早到了两年。
    年与天的转换:
2 年 = 2 × 365 = 730 天(假设非闰年)。
    调整准备工作所花费的天数:730 - 1 = 729 天。
    结论:莎拉在事件发生当天之前,在过去度过了 729 天。

在一部科幻电影中,亚历克斯是一位时间旅行者,他决定
回到过去,亲眼见证一场著名的历史战役
那是100年前发生的一件事,持续了10天。
他于战斗开始前三天抵达。然而,
在过去待了六天后,他跳到了未来。
他花了50年时间才到达那里,并在那里待了20天。然后,他
回到过去见证战争的结束。有多少人
在看到结局之前,亚历克斯花了多少天时间沉浸在过去?
 这场战斗?    #3

思路链:    #4

    首次出行:亚历克斯在战斗开始前三天抵达。
    时间跳跃前的时间:亚历克斯在过去待了六天。
战斗已经开始,并且已经持续了3天(自从他……以来)。
提前3天到达,现在已经待了6天(3+3=6)。
    第一次时间跳跃:亚历克斯跳跃到50年后,并在那里待了20天。
 这使他过去已经度过的6天又增加了20天。
(6 + 20 = 26)。
    重返战场:当亚历克斯回归时,他回到了同一个战场。
根据时间旅行的逻辑,他离开的那一天,战斗已经持续了……
已经三天了。
    等待战斗结束:这场战斗持续了10天。因为他
他已经经历了三天,还需要再等七天。
    结论:Alex总共花费了3(初始等待时间)+ 3(之前)
第一次跳跃)+ 20(50年前)+ 7(返回后)= 33天
在他看到战斗结束之前就已过去。
逐步思考,但只给出最终答案。

用户:
{{陈述}}    #5
system:
"In a time travel movie, Sarah travels back in time to 
prevent a historic event from happening. She arrives 
2 days before the event. After spending a day preparing, 
she attempts to change the event but realizes she has 
actually arrived 2 years early, not 2 days. She then 
decides to wait and live in the past until the event's 
original date. How many days does Sarah spend in the past 
before the day of the event?"     #1

Chain of Thought:     #2

    Initial Assumption: Sarah thinks she has arrived 2 days before the event.
    Time Spent on Preparation: 1 day spent preparing.
    Realization of Error: Sarah realizes she's actually 2 years early.
    Conversion of Years to Days: 
2 years = 2 × 365 = 730 days (assuming non-leap years).
    Adjust for the Day Spent Preparing: 730 - 1 = 729 days.
    Conclusion: Sarah spends 729 days in the past before the day of the event.

"In a sci-fi film, Alex is a time traveler who decides 
to go back in time to witness a famous historical battle 
that took place 100 years ago, which lasted for 10 days. 
He arrives three days before the battle starts. However, 
after spending six days in the past, he jumps forward in 
time by 50 years and stays there for 20 days. Then, he 
travels back to witness the end of the battle. How many 
days does Alex spend in the past before he sees the end of
 the battle?"     #3

Chain of Thought:     #4

    Initial Travel: Alex arrives three days before the battle starts.
    Time Spent Before Time Jump: Alex spends six days in the past. 
The battle has started and has been going on for 3 days (since he 
arrived 3 days early and has now spent 6 days, 3 + 3 = 6).
    First Time Jump: Alex jumps 50 years forward and stays for 20 days.
 This adds 20 days to the 6 days he's already spent in the past 
(6 + 20 = 26).
    Return to the Battle: When Alex returns, he arrives back on the same 
day he left (as per time travel logic). The battle has been going on for 
3 days now.
    Waiting for the Battle to End: The battle lasts 10 days. Since he's 
already witnessed 3 days of it, he needs to wait for 7 more days.
    Conclusion: Alex spends a total of 3 (initial wait) + 3 (before the 
first jump) + 20 (50 years ago) + 7 (after returning) = 33 days in the 
past before he sees the end of the battle.
Think step by step but only show the final answer to the statement.

user:
{{statement}}     #5
#1 几个问题示例

#2 问题的解决方案,以推理步骤序列的形式输出

#3 几个问题示例

#4 问题的解决方案,以推理步骤序列的形式输出

#5 LLM 需要解决的问题

您可能会注意到,图 10.3 的解答也作为示例列于清单 10.8 中。此外,回顾清单 10.7 中 LLM 对该问题的回复也很有帮助。从中,您可以了解 LLM 得出最终答案的推理步骤。

You may note that the solution to figure 10.3 is also provided as an example in listing 10.8. It’s also helpful to go back and review listing 10.7 for the reply from the LLM about the problem. From this, you can see the reasoning steps the LLM applied to get its final answer.

现在,我们可以看一下评估解决方案解决问题效果的提示。打开evaluate_answer.jinja2清单 10.9 中所示的提示,即可查看所使用的提示。该提示很简单,采用零样本提示,并允许 LLM 概括地确定如何对预期结果和预测结果进行评分。我们可以提供示例和评分,从而将其转换为少样本分类的示例。

Now, we can look at the prompt that evaluates how well the solution solved the problem. Open evaluate_answer.jinja2, shown in listing 10.9, to review the prompt used. The prompt is simple, uses zero-shot prompting, and allows the LLM to generalize how it should score the expected and predicted. We could provide examples and scores, thus changing this to an example of a few-shot classification.

列表 10.9 evaluate_answer.jinja2
系统:

请确认预期结果和预测结果是否一致。
对于给定的问题,情况也是如此。    #1
返回 0 到 1 之间的分数,其中 1 表示完全匹配,0 表示完全不匹配。
请只返回分数,不需要解释。    #2

用户:
问题:{{problem}}    #3

预期结果:{{expected}}    #4

预测结果:{{predicted}}    #5
system:

Please confirm that expected and predicted results are 
the same for the given problem.     #1
Return a score from 0 to 1 where 1 is a perfect match and 0 is no match.
Please just return the score and not the explanation.     #2

user:
Problem: {{problem}}     #3

Expected result: {{expected}}     #4

Predicted result: {{predicted}}     #5
#1 评估解决方案的规则

#2 仅返回分数,不返回其他任何内容

#3 初始问题陈述

#4 预期或合理的答案

#5 先前 CoT 提示的输出

查看前面清单 10.7 中所示的 LLM 输出,您就能明白为什么评估步骤可能会令人困惑。或许可以建议 LLM 用一个语句给出最终答案来解决这个问题。下一节,我们将继续讨论另一个提示推理的例子。

Looking at the LLM output shown earlier in listing 10.7, you can see why the evaluation step may get confusing. Perhaps a fix to this could be suggesting to the LLM to provide the final answer in a single statement. In the next section, we move on to another example of prompt reasoning.

10.2.2零剂量 CoT 提示

10.2.2 Zero-shot CoT prompting

正如我们的时间旅行所展示的,对于特定类型的问题,CoT提示在生成提示方面可能成本很高。虽然不如CoT有效,但存在一些与CoT类似的技术,它们不使用示例,并且更具普适性。本节将探讨一种用于LLM中引出推理的简单短语。

As our time travel demonstrates, CoT prompting can be expensive in terms of prompt generation for a specific class of problem. While not as effective, there are techniques similar to CoT that don’t use examples and can be more generalized. This section will examine a straightforward phrase employed to elicit reasoning in LLMs.

在 VS Code 的提示符流程可视化编辑器中打开prompt_flow/zero-shot-cot-prompting/flow.dag.yaml。此流程与之前的 CoT 非常相似,如图 10.4 所示。下一行显示了描述该流程的 YAML 伪代码。

Open prompt_flow/zero-shot-cot-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. This flow is very similar to the previous CoT, as shown in figure 10.4. The next lsting shows the YAML pseudocode that describes the flow.

列表 10.10 zero-shot-CoT-prompting流程
   输入:
        陈述:待解决的问题的陈述
        预期:问题的预期解决方案

   LLM:cot(用于解决问题的提示)
        输入:语句
        输出:根据问题陈述预测的答案

   LLM:evaluate_answer(用于评估解决方案的提示)
        输入:
               语句:-> 输入语句
               预期结果:-> input.expected
               预测结果:-> output.cot

         输出结果:问题解答程度的评分。


    输出:
        语句:-> 输入语句
        预期结果:-> input.expected
        预测结果:-> output.cot
        评估分数:output.evaluate_answer

    ### 示例输出
   {
       "evaluation_score": "1",    #1
       “预期”:在最后一次跳跃之后,
          马克斯发现自己身处1980年,
   他今年75岁了。    #2
       “预测”:“Max 开始时…… 
          因此,在最后一次跳跃之后,
           Max 75 岁,年份是 1980 年。”    #3
       “声明”:“在复杂的时间旅行中……”    #4
   }
   Inputs:
        statement  : the statement problem to be solved
        expected : the expected solution to the problem

   LLM: cot (the prompt used to solve the problem)
        inputs: statement
        outputs: the predicted answer given the problem statement

   LLM: evaluate_answer (the prompt used to evaluate the solution)
        inputs:
               statement: -> input.statement
               expected: -> input.expected
               predicted: -> output.cot

         outputs: a score of how well the problem was answered


    Outputs:
        statement: -> input.statement
        expected: -> input.expected
        predicted: -> output.cot
        evaluation_score: output.evaluate_answer

    ### Example Output
   {
       "evaluation_score": "1",     #1
       "expected": "After the final jump, 
           Max finds himself in the year 1980 and 
   he is 75 years old.",     #2
       "predicted": "Max starts in… 
           Therefore, after the final jump, 
           Max is 75 years old and in the year 1980.",     #3
       "statement": "In a complex time travel …"     #4
   }
#1 最终评估分数

#2 预期答案

#3 预测答案(步骤已省略,直接显示最终答案)

#4 初始问题陈述

在 VS Code 的可视化编辑器中按 Shift-F5 运行/测试流程。流程运行后,您应该会看到与清单 10.10 中类似的输出。此练习示例在处理同一问题时比之前的示例表现更好。

Run/test the flow in VS Code by pressing Shift-F5 while in the visual editor. The flow will run, and you should see output similar to that shown in listing 10.10. This exercise example performs better than the previous example on the same problem.

在 VS Code 中打开cot.jinja2提示符,如清单 10.11 所示。这个提示符比之前的示例简单得多,因为它只使用了零样本推理。然而,一个关键短语可以将这个简单的提示符变成一个强大的推理引擎。提示符中的这行代码Let’s think step by step会触发 LLM 考虑内部上下文进行推理。这反过来又会引导 LLM 分步解决问题。

Open the cot.jinja2 prompt in VS Code, as shown in listing 10.11. This is a much simpler prompt than the previous example because it only uses zero-shot. However, one key phrase turns this simple prompt into a powerful reasoning engine. The line in the prompt Let’s think step by step triggers the LLM to consider internal context showing reasoning. This, in turn, directs the LLM to reason out the problem in steps.

清单 10.11 cot.jinja2
系统:
你是解决时间旅行问题的专家。
你现在面临一个时间旅行问题,你需要解决它。
让我们一步一步来思考。    #1
请用一句话概括您的答案。    #2

用户:
{{陈述}}    #3
system:
You are an expert in solving time travel problems.
You are given a time travel problem and you have to solve it.
Let's think step by step.     #1
Please finalize your answer in a single statement.     #2

user:
{{statement}}     #3
#1 一条概括LLM推理过程的逻辑线

#2 要求LLM给出最终答案陈述

#3 LLM需要解决的问题陈述

类似的语句,例如要求逻辑推理模型(LLM)思考步骤或分步骤作答,也能提取出推理过程。我们将在下一节演示一种类似但更复杂的技术。

Similar phrases asking the LLM to think about the steps or asking it to respond in steps also extract reasoning. We’ll demonstrate a similar but more elaborate technique in the next section.

10.2.3逐步提示链

10.2.3 Step by step with prompt chaining

我们可以将要求语言学习者逐步思考的行为扩展为一系列提示,迫使语言学习者分步骤解决问题。在本节中,我们将探讨一种称为提示链的技术,该技术可以强制语言学习者分步骤处理问题。

We can extend the behavior of asking an LLM to think step by step into a chain of prompts that force the LLM to solve the problem in steps. In this section, we look at a technique called prompt chaining that forces an LLM to process problems in steps.

prompt_flow/prompt-chaining/flow.dag.yaml如图 10.5 所示,在可视化编辑器中打开文件。提示链将解决问题的推理过程分解成一系列提示。这种技术迫使学习逻辑模型(LLM)分步骤地回答问题。

Open the prompt_flow/prompt-chaining/flow.dag.yaml file in the visual editor, as shown in figure 10.5. Prompt chaining breaks up the reasoning method used to solve a problem into chains of prompts. This technique forces the LLM to answer the problem in terms of steps.

数字
图 10.5提示链接流程

清单 10.12 展示了更详细描述流程的 YAML 伪代码。该流程将第一个 LLM 块的输出链接到第二个 LLM 块,然后再将第二个 LLM 块的输出链接到第三个 LLM 块。强制 LLM 以这种方式处理问题可以揭示其推理模式,但同时也可能导致代码过于冗长。

Listing 10.12 shows the YAML pseudocode that describes the flow in a few more details. This flow chains the output of the first LLM block into the second and then from the second into the third. Forcing the LLM to process the problem this way uncovers the reasoning pattern, but it can also be overly verbose.

清单 10.12 prompt-chaining流程
   输入:
        陈述:待解决的问题的陈述

   LLM:分解步骤(用于分解问题的提示)
        输入:
               语句:-> 输入语句    #1

        输出结果:解决问题的步骤分解

   LLM:计算步数(用于计算步数的提示符)
        输入:
               语句:-> 输入语句
               decompose_steps: -> output.decompose_steps    #2

               输出:每一步的计算结果
   LLM:计算解(尝试解决问题)
        输入:
               语句:-> 输入语句
               decompose_steps: -> output.decompose_steps
               calculate_steps: -> output.calculate_steps    #3

         输出:最终解决方案声明

   输出:
        语句:-> 输入语句
        decompose_steps: -> output.decompose_steps
        calculate_steps: -> output.calculate_steps
        calculate_solution: -> output.calculate_solution

   ### 示例输出
{
    "calculate_steps": "1. Alex 花费的天数",
    "decompose_steps": "要弄清楚……",“解决方案”:“在战斗结束前,亚历克斯在过去    
           
    13天。”#4
    “声明”:“在一部科幻电影中,亚历克斯……”    
}
   Inputs:
        statement  : the statement problem to be solved

   LLM: decompose_steps (the prompt used to decompose the problem)
        inputs: 
               statement: -> input.statement     #1

        outputs: the breakdown of steps to solve the problem

   LLM: calculate_steps (the prompt used to calculate the steps)
        inputs:
               statement: -> input.statement
               decompose_steps: -> output.decompose_steps     #2

               outputs: the calculation for each step
   LLM: calculate_solution (attempts to solve the problem)
        inputs:
               statement: -> input.statement
               decompose_steps: -> output.decompose_steps
               calculate_steps: -> output.calculate_steps     #3

         outputs: the final solution statement

   Outputs:
        statement: -> input.statement
        decompose_steps: -> output.decompose_steps
        calculate_steps: -> output.calculate_steps
        calculate_solution: -> output.calculate_solution

   ### Example Output
{
    "calculate_steps": "1. The days spent by Alex",
    "decompose_steps": "To figure out the …",
    "solution": "Alex spends 13 days in the 
            past before the end of the battle.",     #4
    "statement": "In a sci-fi film, Alex …"    
}
#1 提示链的开始

#2 将上一步的输出注入到此步骤中

#3 将前两步的输出注入到此步骤中

#4 最终的解决方案陈述虽然错误,但更接近正确答案。

在可视化编辑器中按 Shift-F5 运行流程,您将看到如清单 10.12 所示的输出。虽然 Alex 问题的答案仍然不正确,但我们可以看到 LLM 为解决问题所做的所有推理工作。

Run the flow by pressing Shift-F5 from the visual editor, and you’ll see the output as shown in listing 10.12. The answer is still not correct for the Alex problem, but we can see all the work the LLM is doing to reason out the problem.

打开所有三个提示符:decompose_steps.jinja2,,calculate_steps.jinja2calculate_solution.jinja2(分别参见清单 10.13、10.14 和 10.15)。可以比较清单中显示的这三个提示符,以了解输出是如何串联起来的。

Open up all three prompts: decompose_steps.jinja2, calculate_steps.jinja2, and calculate_solution.jinja2 (see listings 10.13, 10.14, and 10.15, respectively). All three prompts shown in the listings can be compared to show how outputs chain together.

清单 10.13 decompose_steps.jinja2
系统:
你是一个能够解决问题的人工智能助手。
你的工作是将用户的问题分解成更小的步骤并列出清单。
按解决顺序执行以下步骤。
要一步一步地思考,不要泛泛而谈。
不要尝试解决问题,只需列出步骤即可。#1

用户:
{{陈述}}    #2
system:
You are a problem solving AI assistant.
Your job is to break the users problem down into smaller steps and list 
the steps in the order you would solve them.
Think step by step, not in generalities.
Do not attempt to solve the problem, just list the steps. #1

user:
{{statement}}     #2
#1 强制LLM仅列出步骤,不列出其他内容

#2 初始问题陈述

清单 10.14 calculate_steps.jinja2
系统:
你是一个能够解决问题的人工智能助手。
你会得到一份解决问题的步骤清单。
你的任务是按顺序计算每个步骤的输出结果。
不要试图解决整个问题,
只需列出每个步骤的输出结果即可。    #1
一步一步来。    #2

用户:
{{陈述}}

{{步骤}}    #3
system:
You are a problem solving AI assistant.
You will be given a list of steps that solve a problem.
Your job is to calculate the output for each of the steps in order.
Do not attempt to solve the whole problem,
just list output for each of the steps.     #1
Think step by step.     #2

user:
{{statement}}

{{steps}}     #3
#1 请求 LLM 不要解决整个问题,只解决步骤

#2 使用魔术语句提取推理

过程 #3 注入 decompose_steps 步骤生成的步骤

清单 10.15 calculate_solution.jinja2
系统:
你是一个能够解决问题的人工智能助手。
您将获得一份步骤列表以及每个步骤的计算结果。
利用每一步的计算结果来确定最终结果。
问题的解决方案。
只需提供问题的最终解决方案即可。
用一句简洁的句子。不要包含任何步骤。
在你的回答中。    #1

用户:
{{陈述}}

{{步骤}}    #2

{{计算结果}}    #3
system:
You are a problem solving AI assistant.
You will be given a list of steps and the calculated output for each step.
Use the calculated output from each step to determine the final 
solution to the problem.
Provide only the final solution to the problem in a 
single concise sentence. Do not include any steps 
in your answer.     #1

user:
{{statement}}

{{steps}}     #2

{{calculated}}     #3
#1 请求 LLM 输出最终答案,而不是任何步骤

#2 分解后的步骤

#3 计算出的步骤

在这个练习示例中,我们没有进行任何评估和评分。没有评估,我们可以看到,这套提示序列在解决图 10.3 中展示的更具挑战性的时间旅行问题上仍然存在问题。然而,这并不意味着这种方法没有价值,这种提示方式确实能够很好地解决一些复杂的问题。

In this exercise example, we’re not performing any evaluation and scoring. Without the evaluation, we can see that this sequence of prompts still has problems solving our more challenging time travel problem shown earlier in figure 10.3. However, that doesn’t mean this technique doesn’t have value, and this prompting format solves some complex problems well.

然而,我们想要找到的是一种能够持续解决此类复杂问题的推理和规划方法。下一节将从推理入手,探讨如何评估最佳解决方案。

What we want to find, however, is a reasoning and planning methodology that can solve such complex problems consistently. The following section moves from reasoning to evaluating the best solution.

10.3运用评估方法获得一致的解决方案

10.3 Employing evaluation for consistent solutions

在前一节中,我们了解到,即使是最周全的计划也未必总能得出正确的解决方案。此外,我们可能也并非总能找到答案来验证该解决方案是否正确。事实上,我们常常需要采用某种评估方法来判断解决方案的有效性。

In the previous section, we learned that even the best-reasoned plans may not always derive the correct solution. Furthermore, we may not always have the answer to confirm if that solution is correct. The reality is that we often want to use some form of evaluation to determine the efficacy of a solution.

图 10.6 展示了为引导低学习模型 (LLM) 进行推理和规划而设计的各种提示工程策略的比较。我们已经介绍了左侧的两种策略:零样本直接提示和 CoT 提示。本节接下来的示例练习将探讨 CoT 和 ToT 技术的自洽性。

Figure 10.6 shows a comparison of the prompt engineering strategies that have been devised as a means of getting LLMs to reason and plan. We’ve already covered the two on the left: zero-shot direct prompting and CoT prompting. The following example exercises in this section will look at self-consistency with the CoT and ToT techniques.

数字
图 10.6比较各种提示工程策略,以实现 LLM 的推理和规划

我们将继续聚焦于复杂的时空旅行问题,比较这些在推理和规划基础上增加评估的更高级方法。下一节,我们将评估自洽性。

We’ll continue to focus on the complex time travel problem to compare these more advanced methods that expand on reasoning and planning with evaluation. In the next section, we’ll evaluate self-consistency.

10.3.1评估自我一致性提示

10.3.1 Evaluating self-consistency prompting

提示的一致性不仅仅是降低发送给生命周期管理(LLM)的温度参数。通常,我们希望生成一致的计划或解决方案,同时仍然使用较高的温度来更好地评估计划的所有变体。通过评估多个不同的计划,我们可以更好地了解解决方案的整体价值。

Consistency in prompting is more than just lowering the temperature parameter we send to an LLM. Often, we want to generate a consistent plan or solution and still use a high temperature to better evaluate all the variations to a plan. By evaluating multiple different plans, we can get a better sense of the overall value of a solution.

自洽提示是一种针对特定问题生成多个方案/解决方案的技术。然后,对这些方案进行评估,并采纳出现频率更高或更一致的方案。例如,假设生成了三个方案,其中两个方案相似,而第三个方案则不同。利用自洽性,我们将前两个方案评估为更一致的答案。

Self-consistent prompting is the technique of generating multiple plans/solutions for a given problem. Then, those plans are evaluated, and the more frequent or consistent plan is accepted. Imagine three plans generated, where two are similar, but the third is different. Using self-consistency, we evaluate the first two plans as the more consistent answer.

在 VS Code 的提示流程可视化编辑器中打开prompt_flow/self-consistency-prompting/flow.dag.yaml。流程图 10.7 展示了提示生成流程的简洁性。图中紧邻的是自洽性评估流程。

Open prompt_flow/self-consistency-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. The flow diagram shows the simplicity of the prompt generation flow in figure 10.7. Next to it in the diagram is the self-consistency evaluation flow.

数字
图 10.7评估流程旁边的自洽性提示生成

Prompt Flow 使用有向无环图 (DAG) 格式来执行流程逻辑。DAG 是演示和执行流程逻辑的绝佳方式,但由于它是无环的(即不能重复),因此无法执行循环。然而,由于 Prompt Flow 提供了批处理机制,我们可以利用该机制来模拟流程中的循环或重复。

Prompt flow uses a direct acyclic graph (DAG) format to execute the flow logic. DAGs are an excellent way of demonstrating and executing flow logic, but because they are acyclic, meaning they can’t repeat, they can’t execute loops. However, because prompt flow provides a batch processing mechanism, we can use that to simulate loops or repetition in a flow.

如图 10.6 所示,自洽性算法在收集结果并确定最佳方案/回复之前,会对输入进行三次处理。我们可以采用相同的模式,但使用批处理来生成输出。然后,评估流程将汇总结果并确定最佳答案。

Referring to figure 10.6, we can see that self-consistency processes the input three times before collecting the results and determining the best plan/reply. We can apply this same pattern but use batch processing to generate the outputs. Then, the evaluation flow will aggregate the results and determine the best answer.

在 VS Code 中打开self-consistency-prompting/cot.jinja2提示模板(参见清单 10.16)。清单已简化,因为我们之前已经见过其中的一些部分。此提示使用两个(少样本提示)CoT 示例来向 LLM 演示思维推理过程。

Open the self-consistency-prompting/cot.jinja2 prompt template in VS Code (see listing 10.16). The listing was shortened, as we’ve seen parts before. This prompt uses two (few-shot prompt) examples of a CoT to demonstrate the thought reasoning to the LLM.

清单 10.16 self-consistency-prompting/cot.jinja2
系统:

“在一部穿越时空的电影里,莎拉穿越回了过去……”    #1

思路链:

    初始假设:……    #2
    结论:莎拉在事件发生当天之前,在过去度过了 729 天。

“在一个情节复杂的穿越时空电影故事中,25岁的麦克斯……”    #3

思路链:
    起点:Max 开始…    #4
    结论:在最后一次跳跃之后,
马克斯发现自己身处 1980 年,他已经 75 岁了。
一步一步来,
 但只显示该问题的最终答案。    #5

用户:
{{陈述}}
system:

"In a time travel movie, Sarah travels back… "     #1

Chain of Thought:

    Initial Assumption: …     #2
    Conclusion: Sarah spends 729 days in the past before the day of the event.

"In a complex time travel movie plot, Max, a 25 year old…"     #3

Chain of Thought:
    Starting Point: Max starts …     #4
    Conclusion: After the final jump, 
Max finds himself in the year 1980 and he is 75 years old.
Think step by step,
 but only show the final answer to the statement.     #5

user:
{{statement}}
#1 莎拉的时间旅行问题

#2 示例 CoT(为简洁起见已省略)

#3 麦克斯的时间旅行问题

#4 示例 CoT(为简洁起见已省略)

#5 最终指南和约束输出的说明

在 VS Code 中打开self-consistency-prompting/flow.dag.yaml文件。在可视化编辑器中单击“批量运行”(烧杯图标),以批量模式运行示例。图 10.8 逐步展示了该过程:

Open the self-consistency-prompting/flow.dag.yaml file in VS Code. Run the example in batch mode by clicking Batch Run (the beaker icon) from the visual editor. Figure 10.8 shows the process step by step:

  1. 点击“批量运行”。
  2. Click Batch Run.
  3. 选择 JSON 行 (JSONL) 输入。
  4. Select the JSON Lines (JSONL) input.
  5. 选择statements.jsonl
  6. Select statements.jsonl.
  7. 点击“运行”链接。
  8. Click the Run link.
数字
图 10.8启动批处理过程的逐步流程

清单 10.17 显示了以批处理模式执行流程后生成的 JSON 输出。该statements.jsonl文件包含五个相同的 Alex 时间旅行问题条目。使用相同的条目可以模拟提示信息在重复条目上执行五次。

Listing 10.17 shows the JSON output from executing the flow in batch mode. The statements.jsonl file has five identical Alex time travel problem entries. Using identical entries allows us to simulate the prompt executing five times on the duplicate entry.

列表 10.17 self-consistency-prompting批处理执行输出
{
    "name": "self-consistency-prompting_default_20240203_100322_912000",
    "created_on": "2024-02-03T10:22:30.028558",
    “状态”: “已完成”
    "display_name": "self-consistency-prompting_variant_0_202402031022",
    描述:null,
    “标签”:null,
    “特性”: {
        "flow_path": "…prompt_flow/self-consistency-prompting",    #1
         "output_path" : "…/.promptflow/.runs/self-
  consistency-prompting_default_20240203_100322_912000",    #2
        "system_metrics": {
            "total_tokens": 4649,
            "prompt_tokens": 3635,
            "completion_tokens": 1014,
            持续时间:30.033773
        }
    },
    "flow_name": "自一致性提示",
    "数据": "…/prompt_flow/self-consistency-prompting/
 statements.jsonl",    #3
    "output": "…/.promptflow/.runs/self-consistency- 
 prompting_default_20240203_100322_912000/flow_outputs"
}
{
    "name": "self-consistency-prompting_default_20240203_100322_912000",
    "created_on": "2024-02-03T10:22:30.028558",
    "status": "Completed",
    "display_name": "self-consistency-prompting_variant_0_202402031022",
    "description": null,
    "tags": null,
    "properties": {
        "flow_path": "…prompt_flow/self-consistency-prompting",     #1
        "output_path": "…/.promptflow/.runs/self-
 consistency-prompting_default_20240203_100322_912000",     #2
        "system_metrics": {
            "total_tokens": 4649,
            "prompt_tokens": 3635,
            "completion_tokens": 1014,
            "duration": 30.033773
        }
    },
    "flow_name": "self-consistency-prompting",
    "data": "…/prompt_flow/self-consistency-prompting/
 statements.jsonl",     #3
    "output": "…/.promptflow/.runs/self-consistency-
 prompting_default_20240203_100322_912000/flow_outputs"
}
#1 流程执行路径

#2 包含流程输出的文件夹(请记下此路径)

#3 用于批量运行流程的数据

您可以按住 Ctrl 键并单击输出链接(如清单 10.17 中高亮显示)来查看生成的流程图。这将打开另一个 VS Code 实例,显示一个包含所有运行输出的文件夹。现在,我们想要检查最一致的答案。幸运的是,提示流程中的评估功能可以通过相似性匹配来帮助我们找到一致的答案。

You can view the flow produced by pressing the Ctrl key and clicking the output link, highlighted in listing 10.17. This will open another instance of VS Code, showing a folder with all the output from the run. We now want to check the most consistent answer. Fortunately, the evaluation feature in prompt flow can help us identify consistent answers using similarity matching.

在 VS Code 中打开self-consistency-evaluation/flow.dag.yaml(参见图 10.7)。此流程会嵌入预测答案,然后使用聚合方法确定最一致的答案。

Open self-consistency-evaluation/flow.dag.yaml in VS Code (see figure 10.7). This flow embeds the predicted answer and then uses an aggregation to determine the most consistent answer.

根据流程,consistency.py在 VS Code 中打开,如清单 10.18 所示。此工具函数的代码会计算所有答案对之间的余弦相似度。然后,它会找到最相似的答案,将其记录下来,并作为最终答案输出。

From the flow, open consistency.py in VS Code, as shown in listing 10.18. The code for this tool function calculates the cosine similarity for all pairs of answers. Then, it finds the most similar answer, logs it, and outputs that as the answer.

清单 10.18 consistency.py
来自 promptflow 导入工具
从 typing 导入 List
导入 numpy 库并将其命名为 np
from scipy.spatial.distance import cosine
@工具
def consistency(texts: List[str],
                嵌入:List[List[float]]) -> str:
    如果 len(embeddings) != len(texts):raise ValueError ("嵌入
       
        的数量必须与文本的数量匹配。")

    mean_embedding = np.mean(embeddings, axis=0)    #1
    相似度 = [1 - cosine(embedding, mean_embedding) 
                 for embedding in embeddings]    #2
    most_similar_index = np.argmax(similarities)    #3

    from promptflow import log_metric
    log_metric(key="highest_ranked_output", value=texts[most_similar_index])    #4

    返回文本[最相似索引]    #5
from promptflow import tool
from typing import List
import numpy as np
from scipy.spatial.distance import cosine
@tool
def consistency(texts: List[str],
                embeddings: List[List[float]]) -> str:
    if len(embeddings) != len(texts):
        raise ValueError("The number of embeddings 
        must match the number of texts.")

    mean_embedding = np.mean(embeddings, axis=0)     #1
    similarities = [1 - cosine(embedding, mean_embedding) 
                 for embedding in embeddings]     #2
    most_similar_index = np.argmax(similarities)     #3

    from promptflow import log_metric
    log_metric(key="highest_ranked_output", value=texts[most_similar_index])     #4

    return texts[most_similar_index]     #5
#1 计算所有嵌入的均值

#2 计算每对嵌入的余弦相似度

#3 找到最相似答案的索引

#4 将输出结果记录为度量值

#5 返回最相似答案的文本

我们还需要以批处理模式运行评估流程。self-consistency-evaluation/flow.dag.yaml在 VS Code 中打开流程并以批处理模式运行(烧杯图标)。然后,选择“现有运行”作为流程输入,并在提示时选择您刚刚执行的最后一个运行作为输入。

We need to run the evaluation flow in batch mode as well. Open self-consistency-evaluation/flow.dag.yaml in VS Code and run the flow in batch mode (beaker icon). Then, select Existing Run as the flow input, and when prompted, choose the top or the last run you just executed as input.

同样,流程处理完成后,您会看到类似清单 10.17 所示的输出。按住 Ctrl 键并单击输出文件夹链接,即可打开一个新的 VS Code 实例来查看结果。metric.json在 VS Code 中找到并打开该文件,如图 10.9 所示。

Again, after the flow completes processing, you’ll see an output like that shown in listing 10.17. Ctrl-click on the output folder link to open a new instance of VS Code showing the results. Locate and open the metric.json file in VS Code, as shown in figure 10.9.

数字
图 10.9 VS Code 已打开并指向批处理运行的输出文件夹。图中高亮显示的是metrics.json文件以及显示最相似答案的输出。

图 10.9 中显示的答案对于本次运行仍然是错误的。您可以继续进行几次批量运行,或者增加每次运行的次数,然后评估流程,看看是否能得到更好的答案。这种方法通常对比较简单的问题更有帮助,但仍然表明您缺乏解决复杂问题的能力。

The answer shown in figure 10.9 is still incorrect for this run. You can continue a few more batch runs of the prompt and/or increase the number of runs in a batch and then evaluate flows to see if you get better answers. This technique is generally more helpful for more straightforward problems but still demonstrates an inability to reason out complex problems.

自我一致性采用反思性方法来评估最有可能出现的想法。然而,最有可能出现的想法并不总是最佳选择。因此,在下一节中,我们必须考虑一种更全面的方法。

Self-consistency uses a reflective approach to evaluate the most likely thought. However, the most likely thing is certainly not always the best. Therefore, we must consider a more comprehensive approach in the next section.

10.3.2评估思维树提示

10.3.2 Evaluating tree of thought prompting

如前所述,如图 10.6 所示,ToT 提示法结合了自我评估和提示链技术。因此,它将计划流程分解为一系列提示,并在提示链的每个步骤中提供多次评估的机会。这样就形成了一个树状结构,可以按层级进行广度优先的执行和评估,也可以自上而下进行深度优先的执行和评估。

As mentioned earlier, ToT prompting, as shown in figure 10.6, combines self-evaluation and prompt chaining techniques. As such, it breaks down the sequence of planning into a chain of prompts, but at each step in the chain, it provides for multiple evaluations. This creates a tree that can be executed and evaluated at each level, breadth-first, or from top to bottom, depth-first.

图 10.10 展示了使用广度优先和深度优先算法执行决策树的区别。遗憾的是,由于提示流的 DAG 执行模式,我们无法快速实现深度优先算法,但广度优先算法可以很好地工作。

Figure 10.10 shows the difference between executing a tree using breadth-first or depth-first. Unfortunately, due to the DAG execution pattern of prompt flow, we can’t quickly implement the depth-first method, but breadth-first works just fine.

数字
图 10.10在 ToT 模式下的广度优先与深度优先执行

在 VS Code 中打开tree-of-thought-evaluation/flow.dag.yaml。流程图如图 10.11 所示。该流程的功能类似于广度优先的 ToT 模式——该流程将一系列提示串联起来,要求 LLM 在每个步骤中返回多个计划。

Open tree-of-thought-evaluation/flow.dag.yaml in VS Code. The visual of the flow is shown in figure 10.11. This flow functions like a breadth-first ToT pattern—the flow chains together a series of prompts asking the LLM to return multiple plans at each step.

数字
图 10.11 ToT 模式表达和快速流动

由于流程采用广度优先的方式执行,因此每个节点的每一层输出都会被评估。流程中的每个节点都使用一对语义函数——一个用于生成答案,另一个用于评估答案。语义函数是一个自定义的 Python 流程块,它可以处理多个输入并生成多个输出。

Because the flow executes in a breadth-first style, each level output of the nodes is also evaluated. Each node in the flow uses a pair of semantic functions—one to generate the answer and the other to evaluate the answer. The semantic function is a custom Python flow block that processes multiple inputs and generates multiple outputs.

清单 10.19 展示了该semantic_function.py工具。此通用工具在本流程中的多个模块中重复使用。它还演示了如何将 SK 中的功能嵌入到提示流程中,以便直接使用。

Listing 10.19 shows the semantic_function.py tool. This general tool is reused for multiple blocks in this flow. It also demonstrates the embedding functionality from the SK for direct use within prompt flow.

列表 10.19 semantic_function.py
@工具
def my_python_tool(
    输入:字符串,
    input_node: int,
    history: str,
    semantic_function: str,
    evaluation_function: str,
    函数名:字符串,
    技能名称:字符串,
    max_tokens: int,
    温度:浮点,
    deployment_name: str,
    连接:Union[OpenAIConnection,
                      AzureOpenAIConnection],    #1
) -> str:
    如果输入为 None 或 输入 == "":    #2
        返回 ””

    内核 = sk.Kernel(log=sk.NullLogger())
    # 省略了用于设置内核和LLM连接的代码


    函数 = kernel.create_semantic_function(
                             语义函数,                                               
                             函数名=函数名,
                             技能名称=技能名称,
                             max_tokens=max_tokens,
                             温度=温度,
                             top_p=0​​.5)    #3
    评估 = kernel.create_semantic_function(
                             评估函数,        
                             函数名="评估",
                             技能名称=技能名称,
                             max_tokens=max_tokens,
                             温度=温度,
                             top_p=0​​.5)    #4

    async def main():
        查询 = f"{历史}\n{输入}"
        尝试:
            eval = int((await evaluation.invoke_async(query)).result)
            如果 eval > 25:    #5
                返回 await function.invoke_async(query)   #6
        除异常 e 外:
            引发异常(“评估失败”,e)

       尝试:
        result = asyncio.run(main()).result
        返回结果
    除异常 e 外:
        print(e)
        返回 ””
@tool
def my_python_tool(
    input: str,
    input_node: int,
    history: str,
    semantic_function: str,
    evaluation_function: str,
    function_name: str,
    skill_name: str,
    max_tokens: int,
    temperature: float,
    deployment_name: str,
    connection: Union[OpenAIConnection, 
                      AzureOpenAIConnection],     #1
) -> str:
    if input is None or input == "":     #2
        return ""

    kernel = sk.Kernel(log=sk.NullLogger())
    # code for setting up the kernel and LLM connection omitted


    function = kernel.create_semantic_function(
                             semantic_function,                                               
                             function_name=function_name,
                             skill_name=skill_name,
                             max_tokens=max_tokens,
                             temperature=temperature,
                             top_p=0.5)     #3
    evaluation = kernel.create_semantic_function(
                             evaluation_function,        
                             function_name="Evaluation",
                             skill_name=skill_name,
                             max_tokens=max_tokens,
                             temperature=temperature,
                             top_p=0.5)     #4

    async def main():
        query = f"{history}\n{input}"
        try:
            eval = int((await evaluation.invoke_async(query)).result)
            if eval > 25:     #5
                return await function.invoke_async(query)    #6
        except Exception as e:
            raise Exception("Evaluation failed", e)

       try:
        result = asyncio.run(main()).result
        return result
    except Exception as e:
        print(e)
        return ""
#1 使用联合体来支持不同类型的 LLM 连接。

#2 检查输入是否为空或 None;如果是,则不应执行该函数。

#3 设置生成函数以创建计划。

#4 设置评估函数。

#5 运行评估函数并确定输入是否足够好以继续执行。

#6 如果评估分数足够高,则生成下一步。

语义函数工具用于决策树的专家、节点和答案块中。在每个步骤中,该函数都会判断是否有文本输入。如果没有文本输入,则该块返回而不执行。如果某个块没有接收任何文本,则意味着前一个块的评估失败。通过在每个步骤之前进行评估,ToT 可以避免执行它认为无效的计划。

The semantic function tool is used in the tree’s experts, nodes, and answer blocks. At each step, the function determines if any text is being input. If there is no text, the block returns with no execution. Passing no text to a block means that the previous block failed evaluation. By evaluating before each step, ToT short-circuits the execution of plans it deems as not being valid.

这或许一开始难以理解,所以请先在 VS Code 中运行流程。清单 10.20 仅显示了运行结果中答案节点的输出;实际结果可能与您看到的有所不同,但应该类似。未返回任何文本的节点要么评估失败,要么其父节点评估失败。

This may be a complex pattern to grasp at first, so go ahead and run the flow in VS Code. Listing 10.20 shows just the answer node output of a run; these results may vary from what you see but should be similar. Nodes that return no text either failed evaluation or their parents did.

清单 10.20tree-of-thought-evaluation流输出
{
    "answer_1_1": "",    #1
    "answer_1_2": "",
    "answer_1_3": "",
    "answer_2_1": "Alex 在过去总共待了 29 天之后"
看到战斗结束。
    "answer_2_2": "",    #2
    "answer_2_3": "Alex 在过去总共待了 29 天之后"
看到战斗结束。
    "answer_3_1": "",    #3
    "answer_3_2": "Alex 在过去总共待了 29 天之后"
看到战斗结束。
    "answer_3_3": "Alex 在过去总共待了 9 天之后..."
看到战斗结束。
{
    "answer_1_1": "",     #1
    "answer_1_2": "",
    "answer_1_3": "",
    "answer_2_1": "Alex spends a total of 29 days in the past before he 
sees the end of the battle.",
    "answer_2_2": "",     #2
    "answer_2_3": "Alex spends a total of 29 days in the past before he 
sees the end of the battle.",
    "answer_3_1": "",     #3
    "answer_3_2": "Alex spends a total of 29 days in the past before he 
sees the end of the battle.",
    "answer_3_3": "Alex spends a total of 9 days in the past before he 
sees the end of the battle.",
#1 表示第一个节点的计划无效,未执行。

#2 节点 2 和答案 2 的计划评估失败,未执行。

#3 此节点的计划评估失败,未执行。

清单 10.20 中的输出显示了仅对部分节点进行评估的情况。在大多数情况下,被评估的节点返回了可能有效的答案。如果没有产生任何输出,则表示该节点本身或其父节点无效。当所有同级节点都返回空值时,则表示父节点评估失败。

The output in listing 10.20 shows how only a select set of nodes was evaluated. In most cases, the evaluated nodes returned an answer that could be valid. Where no output was produced, it means that the node itself or its parent wasn’t valid. When sibling nodes all return empty, the parent node fails to evaluate.

正如我们所见,ToT 方法适用于复杂问题,但可能不太实用。执行此流程最多可能需要调用 LLM 27 次才能生成输出。实际上,它可能只需要一半的调用次数,但即便如此,解决一个问题仍然需要十几次或更多的调用。

As we can see, ToT is valid for complex problems but perhaps not very practical. The execution of this flow can take up to 27 calls to an LLM to generate an output. In practice, it may only do half that many calls, but that’s still a dozen or more calls to answer a single problem.

10.4练习

10.4 Exercises

通过以下练习来加深你对相关内容的理解:

Use the following exercises to improve your knowledge of the material:

  • 练习 1——创建直接提示、少样本提示和零样本提示
  • Exercise 1—Create Direct Prompting, Few-Shot Prompting, and Zero-Shot Prompting

目标 ——为LLM创建三个不同的提示,以总结一篇最近的科学文章:一个使用直接提示,一个使用少量提示,最后一个使用零次提示。

Objective —Create three different prompts for an LLM to summarize a recent scientific article: one using direct prompting, one with few-shot prompting, and the last employing zero-shot prompting.

任务:

Tasks:

    • 比较每种方法生成的摘要的有效性。
    • 比较每种方法生成的摘要的准确性。
    • Compare the effectiveness of the summaries generated by each approach.
    • Compare the accuracy of the summaries generated by each approach.
  • 练习 2——构思推理提示
  • Exercise 2—Craft Reasoning Prompts

目标 ——设计一组提示,要求LLM解决逻辑谜题或谜语。

Objective —Design a set of prompts that require the LLM to solve logical puzzles or riddles.

任务:

Tasks:

    • 重点关注题目的结构如何影响LLM的推理过程。
    • 重点关注这如何影响答案的正确性。
    • Focus on how the structure of your prompt can influence the LLM’s reasoning process.
    • Focus on how the same can influence the correctness of its answers.
  • 练习 3——评估提示技巧
  • Exercise 3—Evaluation Prompt Techniques

目标 ——制定一个评估提示,要求LLM预测假设实验的结果。

Objective —Develop an evaluation prompt that asks the LLM to predict the outcome of a hypothetical experiment.

任务:

Task:

    • 创建后续提示,评估 LLM 预测的准确性,并提供有关其推理过程的反馈。
    • Create a follow-up prompt that evaluates the LLM’s prediction for accuracy and provides feedback on its reasoning process.

概括

Summary

  • 直接解决方案提示是一种基础方法,它利用提示来引导学习学习者解决特定问题或任务,强调清晰的问答结构的重要性。
  • Direct solution prompting is a foundational method of using prompts to direct LLMs toward solving specific problems or tasks, emphasizing the importance of clear question-and-answer structures.
  • 少量提示为语言学习模型 (LLM) 提供了一些示例,以指导他们处理新的或未见过的内容,突显了它在使模型能够适应不熟悉的模式方面的强大作用。
  • Few-shot prompting provides LLMs with a few examples to guide them in handling new or unseen content, highlighting its power in enabling the model to adapt to unfamiliar patterns.
  • 零样本学习和提示表明,学习型机器学习模型(LLM)如何能够从训练中概括出解决问题的方法,而无需明确的例子,这展现了它们在新的环境中理解和应用知识的内在能力。
  • Zero-shot learning and prompting demonstrate how LLMs can generalize from their training to solve problems without needing explicit examples, showcasing their inherent ability to understand and apply knowledge in new contexts.
  • 思维链提示引导学习学习者逐步完成推理过程,以解决复杂问题,并说明如何从模型中引出详细的推理过程。
  • Chain of thought prompting guides the LLMs through a reasoning process step by step to solve complex problems, illustrating how to elicit detailed reasoning from the model.
  • 提示链将问题分解成一系列相互衔接的提示,展示了如何将复杂的解决问题过程构建成LLM可管理的步骤。
  • Prompt chaining breaks down a problem into a series of prompts that build upon each other, showing how to structure complex problem-solving processes into manageable steps for LLMs.
  • 自我一致性是一种快速解决问题的技术,它通过评估生成多个解决方案,并选择最一致的答案,强调一致性在获得可靠结果方面的重要性。
  • Self-consistency is a prompt technique that generates multiple solutions to a problem and selects the most consistent answer through evaluation, emphasizing the importance of consistency in achieving reliable outcomes.
  • 思维树提示法结合了自我评估和提示链,为解决复杂问题创建了全面的策略,从而可以系统地探索多种解决方案路径。
  • Tree of thought prompting combines self-evaluation and prompt chaining to create a comprehensive strategy for tackling complex problems, allowing for a systematic exploration of multiple solution paths.
  • 高级快速工程策略提供了对复杂技术的深入了解,例如与 CoT 和 ToT 的自洽性,提供了提高 LLM 生成解决方案的准确性和可靠性的方法。
  • Advanced prompt engineering strategies provide insights into sophisticated techniques such as self-consistency with CoT and ToT, offering methods to increase the accuracy and reliability of LLM-generated solutions.

11 代理人规划和反馈

11 Agent planning and feedback

本章内容

This chapter covers

  • 制定LLM计划并将其应用于代理人和助理
  • Planning for an LLM and implementing it in agents and assistants
  • 通过自定义操作使用 OpenAI Assistants 平台
  • Using the OpenAI Assistants platform via custom actions
  • 在LLM上实现/测试通用规划器
  • Implementing/testing a generic planner on LLMs
  • 在高级模型中使用反馈机制
  • Using the feedback mechanism in advanced models
  • 构建智能体系统中的规划、推理、评估和反馈
  • Planning, reasoning, evaluation, and feedback in building agentic systems

既然我们已经了解了大型语言模型(LLM)如何进行推理和规划,本章将进一步探讨这一概念,即在智能体框架内应用规划。规划应该是任何智能体/助手平台或工具包的核心。我们将首先了解规划的基础知识以及如何通过提示来实现规划器。然后,我们将使用 OpenAI Assistants 平台来了解规划是如何运作的,该平台会自动集成规划功能。最后,我们将构建并实现一个适用于大型语言模型的通用规划器。

Now that we’ve examined how large language models (LLMs) can reason and plan, this chapter takes this concept a step further by employing planning within an agent framework. Planning should be at the core of any agent/assistant platform or toolkit. We’ll start by looking at the basics of planning and how to implement a planner through prompting. Then, we’ll see how planning operates using the OpenAI Assistants platform, which automatically incorporates planning. From there, we’ll build and implement a general planner for LLMs.

计划固然重要,但反馈往往被忽视。因此,在本章的最后几节中,我们将探讨反馈机制并将其应用到计划工具中。您必须熟悉第十章的内容,如有需要,请先复习一下。准备就绪后,我们就可以开始计划了。

Planning can only go so far, and an often-unrecognized element is feedback. Therefore, in the last sections of the chapter, we explore feedback and implement it within a planner. You must be familiar with the content of chapter 10, so please review it if you need to, and when you’re ready, let’s begin planning.

11.1计划:所有代理人/助理的必备工​​具

11.1 Planning: The essential tool for all agents/assistants

无法进行规划、只能执行简单交互的智能体和助手,只不过是聊天机器人而已。正如本书所述,我们的目标并非构建机器人,而是构建能够自主思考的智能体——这些智能体能够接收目标,找出解决方案,并将结果返回给用户。

Agents and assistants who can’t plan and only follow simple interactions are nothing more than chatbots. As we’ve seen throughout this book, our goal isn’t to build bots but rather to build autonomous thinking agents—agents that can take a goal, work out how to solve it, and then return with the results.

图 11.1 展示了智能体/助手将要执行的总体规划流程。该图在第一章中也出现过,但现在我们来更详细地回顾一下。在图的顶部,用户提交一个目标。在智能体系统中,智能体接收目标,构建计划,执行计划,然后返回结果。

Figure 11.1 explains the overall planning process that the agent/assistant will undertake. This figure was also presented in chapter 1, but let’s review it now in more detail. At the top of the figure, a user submits a goal. In an agentic system, the agent takes the goal, constructs the plan, executes it, and then returns the results.

数字
图 11.1代理规划过程

根据您与 ChatGPT、GPT、Claude 等平台的互动情况,您可能已经接触过规划助手,只是没有注意到而已。规划功能正变得越来越普遍,现在大多数商业平台都内置了规划功能,以使模型看起来更加智能和强大。因此,在接下来的练习中,我们将通过一个例子来设定基准,并区分无法进行规划的学习模型和能够进行规划的智能体。

Depending on your interaction with platforms such as ChatGPT and GPTs, Claude, and others, you may have already encountered a planning assistant and not even noticed. Planning is becoming ubiquitous and is now built into most commercial platforms to make the model appear more intelligent and capable. Therefore, in the next exercise, we’ll look at an example to set a baseline and differentiate between an LLM that can’t plan and an agent that can.

接下来,我们将使用 Nexus 来演示原始 LLM 如何无法独立进行规划。如果您在安装、设置和运行 Nexus 时需要帮助,请参阅第 7 章。安装并准备就绪后,我们可以使用 Gradio 界面运行 Nexus,命令如下所示。

For the next exercise, we’ll use Nexus to demonstrate how raw LLMs can’t plan independently. If you need assistance installing, setting up, and running Nexus, refer to chapter 7. After you have Nexus installed and ready, we can begin running it with the Gradio interface, using the commands shown next.

清单 11.1使用 Gradio 接口运行 Nexus
nexus 运行 gradio
nexus run gradio

Gradio 是一款优秀的 Web 界面工具,专为演示 Python 机器学习项目而设计。图 11.2 展示了 Gradio Nexus 界面以及创建代理和使用您选择的代理引擎(OpenAI、Azure 和 Groq)的过程。除非模型/服务器支持工具/动作的使用,否则您无法使用 LM Studio。Anthropic 的 Claude 模型支持内部规划,因此在本练习中,请避免使用此模型。

Gradio is an excellent web interface tool built to demonstrate Python machine learning projects. Figure 11.2 shows the Gradio Nexus interface and the process for creating an agent and using an agent engine (OpenAI, Azure, and Groq) of your choice. You can’t use LM Studio unless the model/server supports tool/action use. Anthropic’s Claude supports internal planning, so for the purposes of this exercise, avoid using this model.

数字
图 11.2在 Nexus 中创建新代理

创建智能体之后,我们需要赋予它特定的操作(工具)来执行或完成目标。通常来说,只提供智能体完成目标所需的操作是最佳选择,原因有以下几点:

After creating the agent, we want to give it specific actions (tools) to undertake or complete a goal. Generally, providing only the actions an agent needs to complete its goal is best for a few reasons:

  • 更多的操作可能会让智能体感到困惑,难以决定使用哪种操作,甚至难以决定如何实现目标。
  • More actions can confuse an agent into deciding which to use or even how to solve a goal.
  • API 对可提交的工具数量有限制;截至撰写本文时,达到此限制相对容易。
  • APIs have limits on the number of tools that can be submitted; at the time of writing, hitting this limit is relatively easy.
  • 除非你的目的就是如此,否则代理人可能会以你意想不到的方式利用你的行为。但请注意,你的行为可能会产生后果。
  • Agents may use your actions in ways you didn’t intend unless that’s your goal. Be warned, however, that actions can have consequences.
  • 安全问题必须考虑在内。LLM(可能是指某种机器人或机器人)不会统治世界,但它们会犯错,而且很容易偏离轨道。记住,这些机器人将独立行动,并且可能采取任何行动。
  • Safety and security need to be considered. LLMs aren’t going to take over the world, but they make mistakes and quickly get off track. Remember, these agents will operate independently and may perform any action.

在本练习中,我们将定义以下列表中描述的目标。

For this exercise, we’ll define the goal described in the following listing.

清单 11.2展示规划:目标
在维基百科上搜索关于{topic}的页面,下载并保存每个页面。
写入名为 Wikipedia_{topic}.txt 的文件
Search Wikipedia for pages on {topic} and download each page and save it 
to a file called Wikipedia_{topic}.txt

此目标将展示以下行动:

This goal will demonstrate the following actions:

  • search_wikipedia(topic)—搜索维基百科并返回给定搜索词的页面 ID。
  • search_wikipedia(topic)—Searches Wikipedia and returns page IDs for the given search term.
  • get_wikipedia_page(page_id)—根据页面 ID 下载页面内容。
  • get_wikipedia_page(page_id)—Downloads the page content given the page ID.
  • save_file—将内容保存到文件。
  • save_file—Saves the content to a file.

如图 11.3 所示,设置代理的操作。同时,请确保将“规划器”设置为“无”。我们稍后会介绍如何设置和使用规划器。您无需点击“保存”按钮;界面会自动保存代理的更改。

Set the actions on the agent, as shown in figure 11.3. You’ll also want to make sure the Planner is set to None. We’ll look at setting up and using planners soon. You don’t have to click Save; the interface automatically saves an agent’s changes.

数字
图 11.3选择代理的操作并禁用规划器

选择操作和计划器后,在列表 11.2 中输入目标。然后点击“创建新主题”以创建一个新的对话。在聊天输入框中输入您要搜索的主题,然后等待客服人员回复。以下是一个已填写主题的目标示例,但您可以根据需要使用任何主题:

After you choose the actions and planner, enter the goal in listing 11.2. Then click Create New Thread to instantiate a new conversation. Substitute the topic you want to search for in the chat input, and wait for the agent to respond. Here’s an example of the goal filled with the topic, but again, use any topic you like:

在维基百科上搜索有关卡尔加里的页面,下载并保存每个页面。
一个名为 Wikipedia_Calgary.txt 的文件。
Search Wikipedia for pages on Calgary and download each page and save it to 
a file called Wikipedia_Calgary.txt.

图 11.4 显示了将目标提交给普通智能体的结果。我们可以看到,智能体执行了搜索主题的工具/操作,但无法执行后续步骤。如果您还记得我们在第 5 章中关于操作的讨论和代码示例,OpenAI、Groq 和 Azure OpenAI 都支持并行操作,但不支持顺序操作或计划操作。

Figure 11.4 shows the results of submitting the goal to the plain agent. We see the agent executed the tool/action to search for the topic but couldn’t execute any steps beyond that. If you recall from our discussion and code example of actions in chapter 5, OpenAI, Groq, and Azure OpenAI all support parallel actions but not sequential or planned actions.

数字
图 11.4尝试让代理/LLM 完成目标的结果

如果提交的目标包含多个并行任务/操作,LLM 可以给出较为准确的答案。但是,如果这些操作是顺序执行的,即一个步骤依赖于另一个步骤,则 LLM 将无法正常工作。请记住,并行操作是指可以与其他操作同时运行的独立操作。

The LLM can answer reasonably well if you submit a goal with several parallel tasks/actions. However, if the actions are sequential, requiring one step to be dependent on another, it will fail. Remember, parallel actions are standalone actions that can be run alongside others.

Anthropic 的 Claude 和 OpenAI Assistants 都支持顺序动作规划。这意味着这两个模型都可以接受顺序规划,它们会执行这些规划并返回结果。下一节我们将探讨顺序规划,并演示其实际应用。

Anthropic’s Claude and OpenAI Assistants support sequential action planning. This means both models can be called with sequential plans, and the model will execute them and return the results. In the next section, we’ll explore sequential planning and then demonstrate it in action.

11.2理解顺序规划过程

11.2 Understanding the sequential planning process

在接下来的练习中,我们将使用 OpenAI 助手来解决相同的目标。如果您拥有 Anthropic/Claude 的账号凭据并且已配置好引擎,也可以使用该模型尝试完成此练习。

In the next exercise, we’ll ask an OpenAI assistant to solve the same goal. If you have Anthropic/Claude credentials and have the engine configured, you can also try this exercise with that model.

图 11.5 展示了按顺序执行任务(规划)和使用迭代执行任务之间的区别。如果您使用过 GPT、智能助手或 Claude Sonnet 3.5,您可能已经体验过这种区别。这些高级工具已经通过提示标注、高级训练或两者结合的方式实现了规划。

Figure 11.5 shows the difference between executing tasks sequentially (planning) and using iteration. If you’ve used GPTs, assistants, or Claude Sonnet 3.5, you’ve likely already experienced this difference. These advanced tools already incorporate planning by prompt annotations, advanced training, or combining both.

数字
图 11.5迭代执行与计划执行的区别

随着学习逻辑模型(LLM)和聊天服务的不断发展,大多数模型很可能都会原生支持某种形式的规划和工具使用。然而,目前大多数模型(包括 GPT-4o)仅支持操作/工具的使用。

As LLM and chat services evolve, most models will likely natively support some form of planning and tool use. However, most models, including GPT-4o, only support action/tool use today.

让我们打开 GPT Assistants Playground 来演示顺序规划的实际应用。如果需要帮助,请参阅第 6 章的设置指南。我们将使用相同的目标,但这次将使用内置规划功能的助手来运行。

Let’s open the GPT Assistants Playground to demonstrate sequential planning in action. If you need help, refer to the setup guide in chapter 6. We’ll use the same goal but, this time, run it against an assistant (which has built-in planning).

启动 Playground 后,创建一个新的助手,并为其分配 `<command>`、`<command>`search_ wikipedia, get_wikipedia_pagesave_file`<command>` 操作。图 11.6 显示了向助手输入目标后的结果。如图所示,助手在后台完成了所有任务,并返回了用户最终请求的输出,从而实现了目标。

After you launch the Playground, create a new assistant, and assign it the search_ wikipedia, get_wikipedia_page, and save_file actions. Figure 11.6 shows the results of entering the goal to the assistant. As you can see, the assistant completed all the tasks behind the scenes and responded with the user’s final requested output, achieving the goal.

数字
图 11.6助手处理目标并输出结果

为了展示 OpenAI Assistant 规划器的有效性,我们在目标中添加了一个新任务:总结每一页的内容。这个新任务没有对应的功能/工具,但 Assistant 足够智能,能够利用其功能总结内容。您可以打开文件[root folder]assistants_working_folder/Wikipedia_{topic}.txt并查看内容,了解 Assistant 的输出结果。现在我们已经了解了 LLM 在没有规划器和规划的情况下是如何工作的,接下来我们将在下一节中创建规划器。

To demonstrate the effectiveness of the OpenAI Assistant’s planner, we added another task, summarizing each page, to the goal. The inserted task didn’t have a function/tool, but the assistant was savvy enough to use its ability to summarize the content. You can see the output of what the assistant produced by opening the [root folder]assistants_working_folder/Wikipedia_{topic}.txt file and reviewing the contents. Now that we understand how LLMs function without planners and planning, we can move on to creating our planners in the next section.

11.3构建顺序规划器

11.3 Building a sequential planner

诸如 LangChain 和 Semantic Kernel (SK) 之类的 LLM 工具拥有许多采用不同策略的规划器。然而,编写我们自己的规划器相对容易,而且 Nexus 还支持插件式接口,允许您添加来自 LangChain 和 SK 等工具的其他规划器,或者您自己的衍生规划器。

LLM tools such as LangChain and Semantic Kernel (SK) have many planners using various strategies. However, writing our planner is relatively easy, and Nexus also supports a plugin-style interface allowing you to add other planners from tools such as LangChain and SK, or your derivatives.

规划器听起来可能很复杂,但通过结合规划和推理的快速工程策略,它们很容易实现。在第十章中,我们介绍了推理和制定计划的基础知识,现在我们可以充分利用这些技能了。

Planners may sound complicated, but they are easily implemented through prompt engineering strategies that incorporate planning and reasoning. In chapter 10, we covered the basics of reasoning and deriving plans, and now we can put those skills to good use.

清单 11.3 展示了一个基于 SK 的顺序规划器,并对其进行了扩展以包含迭代功能。类似清单中所示的提示注释规划器可以根据特定需求进行调整,也可以像清单中所示的规划器一样更加通用。此规划器使用 JSON,但规划器可以使用 LLM 可识别的任何格式,包括代码。

Listing 11.3 shows a sequential planner derived from the SK, which is extended to incorporate iteration. Prompt annotation planners like those shown in the listing can be adapted to fit specific needs or be more general like those shown. This planner uses JSON, but planners could use any format an LLM understands, including code.

清单 11.3 basic_nexus_planner.py
你是Nexus的规划师。    #1
你的任务是逐步创建一个格式正确的 JSON 计划,
达到既定目标。
根据提供的[目标]创建子任务列表。
每个子任务必须来自 [可用功能] 列表。请勿
使用列表中未列出的任何函数。
根据描述和说明来决定使用哪些功能。
函数名称。
有时,函数可能需要参数。如有必要,请提供参数。
计划应尽可能简短。
你还会收到一份纠正性、建议性和认知性的清单。
之前的方案反馈将帮助您做出决定。
例如:

【特殊功能】    #2
for-each-前缀
描述:对列表中的每个项目执行一个函数
参数:
- 函数:要执行的函数
- list:要迭代的项目列表
- index:列表中当前项的参数名称

【可用功能】
获取笑话主题
描述:获取笑话主题列表([str])

邮件至
描述:将输入的文本通过电子邮件发送给收件人
参数:
- 文本:要发送到电子邮件的文本
- 收件人:收件人的电子邮件地址。可以有多个地址。
如果以“;”分隔,则包含在内。

总结
描述:汇总输入文本
参数:
- 文本:要概括的文本

开玩笑
描述:生成一个有趣的笑话
参数:
- 主题:用来创作笑话的主题

[目标]
获取一些笑话主题列表,并针对每个主题生成一个不同的笑话。
把这些笑话通过邮件发给朋友。

[输出]
    {        
        “子任务”:[
            {"function": "GetJokeTopics"},
            {"function": "for-each",
             "args": {
                       "列表": "output_GetJokeTopics",
                       "index": "topic",
                       “功能”:
                                  {
                                   功能: 玩笑,
                                   "args": {"topic": "topic"}}}},
            {
             函数:EmailTo,
              "args": {
                        "text": "for-each_output_GetJokeTopics"
                       收件人:"朋友"}}
        ]
    }
# 还有 2 个例子,但未在此列表中列出

【特殊功能】    #3
对于每个
描述:对列表中的每个项目执行一个函数
参数:
- 函数:要执行的函数
- 迭代器:要迭代的项列表
- index:列表中当前项的参数名称  

【可用功能】    #4
{{$available_functions}}

[目标]
{{$goal}}    #5

请务必只使用可用函数列表中的函数。
计划应尽可能简短。
仅以 JSON 格式返回计划。
[输出]    #6
You are a planner for Nexus.     #1
Your job is to create a properly formatted JSON plan step by step, to 
satisfy the goal given.
Create a list of subtasks based off the [GOAL] provided.
Each subtask must be from within the [AVAILABLE FUNCTIONS] list. Do not 
use any functions that are not in the list.
Base your decisions on which functions to use from the description and the 
name of the function.
Sometimes, a function may take arguments. Provide them if necessary.
The plan should be as short as possible.
You will also be given a list of corrective, suggestive and epistemic 
feedback from previous plans to help you make your decision.
For example:

[SPECIAL FUNCTIONS]     #2
for-each- prefix
description: execute a function for each item in a list
args: 
- function: the function to execute
- list: the list of items to iterate over
- index: the arg name for the current item in the list

[AVAILABLE FUNCTIONS]
GetJokeTopics
description: Get a list ([str]) of joke topics

EmailTo
description: email the input text to a recipient
args:
- text: the text to email
- recipient: the recipient's email address. Multiple addresses may be 
included if separated by ';'.

Summarize
description: summarize input text
args:
- text: the text to summarize

Joke
description: Generate a funny joke
args:
- topic: the topic to generate a joke about

[GOAL]
"Get a list of joke topics and generate a different joke for each topic. 
Email the jokes to a friend."

[OUTPUT]
    {        
        "subtasks": [
            {"function": "GetJokeTopics"},
            {"function": "for-each",
             "args": {
                       "list": "output_GetJokeTopics",
                       "index": "topic", 
                       "function": 
                                  {
                                   "function": "Joke",
                                   "args": {"topic": "topic"}}}},
            {
             "function": "EmailTo",
              "args": {
                        "text": "for-each_output_GetJokeTopics"
                       ecipient": "friend"}}
        ]
    }
# 2 more examples are given but omitted from this listing

[SPECIAL FUNCTIONS]     #3
for-each
description: execute a function for each item in a list
args: 
- function: the function to execute
- iterator: the list of items to iterate over
- index: the arg name for the current item in the list  

[AVAILABLE FUNCTIONS]     #4
{{$available_functions}}

[GOAL]
{{$goal}}     #5

Be sure to only use functions from the list of available functions. 
The plan should be as short as possible. 
And only return the plan in JSON format.
[OUTPUT]     #6
#1 前导指令,告诉智能体如何处理示例

#2 三个(少样本)示例的开头

#3 添加 for-each 特殊迭代函数

#4 可用函数从智能体的可用函数列表中自动填充

#5 在此处插入目标

#6 智能体预期放置输出的位置

图 11.7 展示了规划提示的构建和运行过程,从构建到执行,最终将结果返回给用户。规划器的工作原理是:构建规划提示,将其提交给 LLM 以构建计划,在本地解析和执行该计划,将结果返回给 LLM 进行评估和总结,最后将最终输出返回给用户。

Figure 11.7 shows the process of building and running a planning prompt, from building to execution to finally returning the results to the user. Planners work by building a planning prompt, submitting it to an LLM to construct the plan, parsing and executing the plan locally, returning the results to an LLM to evaluate and summarize, and finally returning the final output back to the user.

数字
图 11.7创建和执行计划的规划过程

规划过程中的一些细微之处需要特别注意。通常情况下,规划是在独立环境中构建的,不会添加上下文历史记录。这样做是为了专注于目标,因为大多数规划提示都会消耗大量令牌。执行器中的函数通常在本地环境中执行,可能包括调用 API、执行代码,甚至运行机器学习模型。

It’s essential to notice a few subtle details about the planning process. Typically, the plan is built in isolation by not adding context history. This is done to focus on the goal because most planning prompts consume many tokens. Executing the functions within the executor is usually done in a local environment and may include calling APIs, executing code, or even running machine learning models.

create_plan清单 11.4 展示了该类中函数的代码BasicNexusPlanner;LangChain 和 SK 等工具也使用了类似的模式。该过程将代理的操作加载为字符串。然后,目标和可用函数列表会使用 `<template-permission>` 插入到规划器提示模板中PromptTemplateManager,`<template-permission>` 只是模板处理代码的一个包装器。模板处理可以使用简单的正则表达式完成,但也可以使用 Jinja2、Handlebars 或 Mustache 等工具来实现更复杂的功能。

Listing 11.4 shows the code for the create_plan function from the BasicNexusPlanner class; tools such as LangChain and SK use similar patterns. The process loads the agent’s actions as a string. The goal and available functions list are then inserted into the planner prompt template using the PromptTemplateManager, which is just a wrapper for the template-handling code. Template handling is done with simple regex but can also be more sophisticated using tools such as Jinja2, Handlebars, or Mustache.

清单 11.4 basic_nexus_planner.py ( create_plan)
def create_plan(self, nexus, agent, goal: str, prompt: str = PROMPT) -> Plan:
        selected_actions = nexus.get_actions(agent.actions)
        available_functions_string = "\n\n".join(
            format_action(action) for action in selected_actions
        )    #1

        context = {}    #2
        context["goal"] = 目标
        context["available_functions"] = available_functions_string

        ptm = PromptTemplateManager()    #3
        prompt = ptm.render_prompt(prompt, context)

        plan_text = nexus.execute_prompt(agent, prompt)    #4
        返回计划(提示=提示,
                    目标=目标,
                    计划文本=计划文本)    #5
def create_plan(self, nexus, agent, goal: str, prompt: str = PROMPT) -> Plan:
        selected_actions = nexus.get_actions(agent.actions)
        available_functions_string = "\n\n".join(
            format_action(action) for action in selected_actions
        )     #1

        context = {}     #2
        context["goal"] = goal
        context["available_functions"] = available_functions_string

        ptm = PromptTemplateManager()     #3
        prompt = ptm.render_prompt(prompt, context)

        plan_text = nexus.execute_prompt(agent, prompt)     #4
        return Plan(prompt=prompt, 
                    goal=goal, 
                    plan_text=plan_text)     #5
#1 加载代理的可用操作,并将结果字符串格式化以供规划器使用

。#2 上下文将被注入到规划器提示模板中。

#3 一个简单的模板管理器,概念类似于 Jinja2、Handlebars 或 Mustache。

#4 将填写好的规划器提示发送到 LLM。

#5 结果(规划)被封装在 Plan 类中并返回执行。

清单 11.5 中所示的执行计划的代码解析 JSON 字符串并执行函数。执行计划时,代码会检测特定for-each函数,该函数遍历列表并执行每个元素。每次函数执行的结果都会添加到上下文中。此上下文会传递给每个函数调用,并作为最终输出返回。

The code to execute the plan, shown in listing 11.5, parses the JSON string and executes the functions. When executing the plan, the code detects the particular for-each function, which iterates through a list and executes each element in a function. The results of each function execution are added to the context. This context is passed to each function call and returned as the final output.

清单 11.5 basic_nexus_planner.py ( execute_plan)
def execute_plan(self, nexus, agent, plan: Plan) -> str:
        context = {}
        计划 = 计划.生成的计划
        对于计划["子任务"]中的每个任务:    #1
            如果 task["function"] == "for-each":    #2
                list_name = task["args"]["list"]
                index_name = task["args"]["index"]
                inner_task = task["args"]["function"]

                list_value = context.get(list_name, [])
                对于 list_value 中的每个 item:
                    context[index_name] = 项目
                    result = nexus.execute_task(agent, inner_task, context)
                    context[f"for-each_{list_name}_{item}"] = result

                for_each_output = [ #2
                    context[f"for-each_{list_name}_{item}"] 
                      对于 list_value 中的每个 item
                ]
                context[f"for-each_{list_name}"] = for_each_output

                对于 list_value 中的每个 item:    #3
                    删除 context[f"for-each_{list_name}_{item}"]

            别的:
                result = nexus.execute_task(agent,
                                            任务,
                                            语境)    #4
                context[f"output_{task['function']}"] = result

        返回上下文    #5
def execute_plan(self, nexus, agent, plan: Plan) -> str:
        context = {}
        plan = plan.generated_plan
        for task in plan["subtasks"]:     #1
            if task["function"] == "for-each":     #2
                list_name = task["args"]["list"]
                index_name = task["args"]["index"]
                inner_task = task["args"]["function"]

                list_value = context.get(list_name, [])
                for item in list_value:
                    context[index_name] = item
                    result = nexus.execute_task(agent, inner_task, context)
                    context[f"for-each_{list_name}_{item}"] = result

                for_each_output = [     #2
                    context[f"for-each_{list_name}_{item}"] 
                      for item in list_value
                ]
                context[f"for-each_{list_name}"] = for_each_output

                for item in list_value:     #3
                    del context[f"for-each_{list_name}_{item}"]

            else:
                result = nexus.execute_task(agent,
                                            task,
                                            context)     #4
                context[f"output_{task['function']}"] = result

        return context     #5
#1 遍历计划中的每个子任务

#2 处理需要迭代的函数,并将完整的结果列表添加到上下文中

#3 删除各个 for-each 循环的上下文条目

#4 执行常规任务

#5 返回完整的上下文,其中包含每次函数调用的结果

整个执行过程返回的上下文信息会通过最后一次调用发送给 LLM,LLM 会汇总结果并返回响应。如果一切按计划进行,LLM 将返回结果摘要。如果出现错误或缺少某些内容,LLM 可能会尝试修复问题或通知用户错误情况。

The returned context from the entire execution is sent in a final call to the LLM, which summarizes the results and returns a response. If everything goes as planned, the LLM will respond with a summary of the results. If there is an error or something is missing, the LLM may try to fix the problem or inform the user of the error.

现在我们再次打开 Nexus,测试一下规划器的运行情况。加载上次使用的同一个代理,但这次在“高级选项”中选择规划器,如图 11.8 所示。然后,像之前一样输入目标提示,让代理执行该任务。

Let’s now open Nexus again and test a planner in operation. Load up the same agent you used last time, but select the planner under the Advanced options this time, as shown in figure 11.8. Then, enter the goal prompt as you did before, and let the agent take it away.

数字
图 11.8使用基本规划器在 Nexus 中请求完成目标的结果

几分钟后,代理程序会返回已保存的文件,在某些情况下,它还会提供额外信息,例如后续步骤以及如何处理输出结果。这是因为代理程序已获得其完成任务的概览。但请记住,计划执行是在本地层面进行的,只有上下文、计划和目标被发送到 LLM。

After a few minutes, the agent returns with the saved file, and in some cases, it may provide extra information, such as the next steps and what to do with the output. This is because the agent was given a high-level overview of what it accomplished. Remember, though, that plan execution is done at the local level, and only context, plan, and goal were sent to the LLM.

这意味着计划执行可以由任何进程完成,不一定非得由代理完成。在LLM之外执行计划可以减少代理执行计划所需的令牌和工具使用量。这也意味着LLM无需支持工具使用即可使用计划器。

This means that plan execution can be completed by any process, not necessarily by the agent. Executing a plan outside the LLM reduces the tokens and tool use the agent needs to perform. This also means that an LLM doesn’t need to support tools usage to use a planner.

在 Nexus 内部,当启用规划器时,代理引擎工具会被绕过。规划器会完成动作的执行,而代理仅通过传递输出上下文来感知这些动作。这对于支持工具使用但无法进行规划的模型来说是有利的。然而,对于既支持工具使用又支持规划的模型(例如 Claude),规划器可能会限制其功能。

Internally, when a planner is enabled within Nexus, the agent engine tool is bypassed. Instead, the planner completes the action execution, and the agent is only aware of the actions through the passing of the output context. This can be good for models that support tool use but can’t plan. However, a planner may limit functionality for models that support both tool use and planning, such as Claude.

一般来说,你需要了解你所使用LLM(生命周期管理)的功能。如果你不确定这些细节,也可以通过一些尝试来找到答案。让代理分别在启用和禁用计划功能的情况下完成一个多步骤目标,然后查看结果。

In general, you’ll want to understand the capabilities of the LLM you’re using. If you’re unsure of those details, then a little trial and error can also work. Ask the agent to complete a multistep goal with and without planning enabled, and then see the results.

规划使智能体能够完成多个顺序任务,从而实现更复杂的目标。外部规划或提示规划的问题在于它绕过了反馈迭代循环,而反馈迭代循环有助于快速纠正问题。正因如此,OpenAI 和其他公司现在正在将推理和规划直接集成到 LLM 层,我们将在下一节中看到这一点。

Planning allows agents to complete multiple sequential tasks to achieve more complex goals. The problem with external or prompt planning is that it bypasses the feedback iteration loop, which can help correct problems quickly. Because of this, OpenAI and others are now directly integrating reasoning and planning at the LLM level, as we’ll see in the next section.

11.4回顾逐步规划器:OpenAI Strawberry

11.4 Reviewing a stepwise planner: OpenAI Strawberry

代号为“草莓”(Strawberry)的 o1-preview 模型发布,标志着智能体系统可用的学习逻辑模型(LLM)类型发生了巨大转变。“草莓”不仅在数学、科学和一般计算任务方面效率更高,而且还能够在学习逻辑模型中直接进行推理、规划、评估和反馈。

The release of the o1-preview model, code named Strawberry, introduced a dramatic shift in the type of LLMs becoming available for agentic systems. Strawberry was not only proclaimed to be more efficient at math, science, and general calculation tasks but also able to engage in reasoning, planning, evaluation, and feedback directly in the LLM.

回顾一下我们在第十章中提到的时间旅行问题,如图 11.9 所示。如果你还记得,这个问题用 GPT-4 和其他类似的语言学习模型很难解决。但是,通过应用推理和反馈机制,我们最终能够得到一些偶尔正确的结果。

Consider our time travel problem from chapter 10 and shown again in figure 11.9. If you recall, this problem was difficult to solve using GPT-4 and other similar LLMs. However, with the application of reasoning and feedback, we were able to produce output that was occasionally correct.

数字
图 11.9时间旅行问题再探

作为一项实验,请使用 o1-preview 模型将此问题输入到 ChatGPT 中,如清单 11.6 所示。稍等片刻,等待答案。没错,模型仍然答错了。

As an experiment, enter this problem into ChatGPT using the o1-preview model, as shown in listing 11.6. Sit back for a few seconds and wait for the answer. Yep, the model still gets it wrong.

清单 11.6时间旅行推理/规划问题
在一部科幻电影中,亚历克斯是一位时间旅行者,他决定回到过去。
亲眼见证100年前发生的一场著名历史战役,
战斗持续了10天。他在战斗开始前三天到达。
然而,在过去待了六天之后,他跳跃到了未来。
50年后,他在那里待了20天。然后,他返回了……
见证战斗的结束。亚历克斯在过去待了多少天。
在他看到战斗结束之前?
In a sci-fi film, Alex is a time traveler who decides to go back in time 
to witness a famous historical battle that took place 100 years ago, 
which lasted for 10 days. He arrives three days before the battle starts. 
However, after spending six days in the past, he jumps forward in time 
by 50 years and stays there for 20 days. Then, he travels back to 
witness the end of the battle. How many days does Alex spend in the past 
before he sees the end of the battle?

虽然看到模型给出错误答案可能会有些令人失望,但它在分解问题和展示答案方面做得更好。清单 11.7 展示了将清单 11.6 中的问题交给 Strawberry 模型后的示例输出。请注意,由于 LLM 的随机性,您可能会得到不同的答案。

While it may be somewhat disappointing to see the model get the wrong answer, it does, however, do a far better job of breaking down the problem and demonstrating its answer. Listing 11.7 shows the sample output from posing the problem in listing 11.6 to the Strawberry model. Note, you may get a different answer because of the stochastic nature of the LLM.

列表 11.7 o1-预览对时间旅行问题的回应
为了确定亚历克斯在目睹……之前在过去待了多少天
战斗结束后,让我们一步步重构时间线:
抵达过去:
亚历克斯在战斗开始前3天到达。
他花了这三天时间等待战斗开始。
战斗期间:
战争开始了,亚历克斯经历了战争的前三天。
截至目前,他总共在过去待了6天(前3天)。
这场战斗以及战斗期间的3天)。
时间快进:    #1
6天后,亚历克斯的时间跳跃到了50年后。
他在未来世界待了20天。
回到过去:
随后,亚历克斯回到过去,亲眼目睹了这场战斗的结局。
由于战斗持续了10天,而他在战斗的第3天就离开了,
这场战斗还剩7天。
然而,因为他离开的时间长达20天,这比……
在剩余的7天战斗中,他必须回到之前的某个时间点。
战斗结束,我们得以见证它的结局。
为了目睹这场战斗的结局,他回到了过去,回到了最后一天。
战斗。
战斗结束前的最后几天:
返回后,他又在过去待了一天,亲眼目睹了那场战斗。
结尾。
在看到结局之前,过去总共花费了多少天:
时间跳跃前6天。
回到过去一天后。
总计:6 + 1 = 7 天。
答案:7
To determine how many days Alex spends in the past before witnessing the 
end of the battle, let's reconstruct the timeline step by step:
Arrival in the Past:
Alex arrives 3 days before the battle begins.
He spends these 3 days waiting for the battle to start.
During the Battle:
The battle starts, and Alex experiences the first 3 days of it.
At this point, he has spent a total of 6 days in the past (3 days before 
the battle and 3 days during the battle).
Time Jump Forward:     #1
After these 6 days, Alex jumps 50 years forward in time.
He spends 20 days in this future time.
Return to the Past:
Alex then returns to the past to witness the end of the battle.
Since the battle lasts 10 days and he left after day 3 of the battle, 
there are 7 days remaining in the battle.
However, because he was away for 20 days, which is longer than the 
remaining 7 days of the battle, he must return to a point in time before 
the battle ends to witness its conclusion.
To see the end of the battle, he returns to the past on the last day of 
the battle.
Final Days Spent Before the Battle Ends:
Upon returning, he spends 1 more day in the past to witness the battle's 
end.
Total Days Spent in the Past Before Seeing the End:
6 days before jumping forward in time.
1 day after returning to the past.
Total: 6 + 1 = 7 days.
Answer: 7
#1 模型出错的地方变得显而易见。

因为我们知道正确答案是 27,所以我们知道 LLM 模型是错误的;但如果我们不知道正确答案,我们很容易就会认为整个计算过程和推理都是正确的。当我们移除 LLM 交互和智能体系统中的反馈机制时,就会出现这类问题。反馈机制可以引导模型进行自我修正。

Because we know the right answer is 27, we know the LLM is wrong, but if we didn’t, we could just as easily assume that the work and reasoning were all correct. Problems like this can happen when we remove feedback in LLM interactions and agentic systems. Feedback can guide the model to correct itself.

然而,如果我们不知道正确答案是 27(假设他没有花一整天时间观看战斗,则答案是 26),而只是假设 LLM 或智能体给出的答案是正确的,该怎么办呢?其实,我们可以通过几个简单的提示来解决这个问题,这些提示可以引导 LLM 进行推理和规划反馈。不过,这些技巧对于像 OpenAI Assistant 这样的 LLM 或封装器来说更为有效,因为它们在模型内部就提供了推理和规划功能。

However, what if we didn’t know the correct answer was 27 (26, if you assume he doesn’t spend the day to witness the battle) and assumed the LLM or agent was correct? Well, this is a problem we can rectify with a couple of simple prompts that can engage the LLM in reasoning and planning feedback. However, these techniques are more effective with LLMs or wrappers such as the OpenAI Assistants, which provide reasoning and planning within the model.

我们希望向LLM提供反馈,但理解反馈内容对我们来说可能比较困难。幸运的是,只要我们给出正确答案,就可以直接从LLM那里获得反馈。清单11.8展示了如何就我们的时间旅行问题从LLM获得建设性反馈。

What we want to do is provide feedback to the LLM, but understanding what that feedback is will likely be difficult for us. Fortunately, we can elicit feedback directly from the LLM, provided we give the correct answer. Listing 11.8 shows how to generate constructive feedback from the LLM concerning our time travel problem.

列表 11.8生成反馈
正确答案是 27,请检查你的错误之处并提出建议。
在尝试解决未来类似问题时,你可以给自己一些反馈。
the correct answer is 27, please review what you did wrong and suggest 
feedback you could give yourself when trying to solve similar future problems

在模型回答错误后输入此信息,即可生成反馈,您可以利用这些反馈通过提示或作为系统指令的一部分来指导模型。清单 11.9 展示了 o1-preview 提供的反馈示例。下次需要处理复杂的时间旅行问题时,您可以提取此反馈并完善指令。

Entering this after the model answers the question wrong will generate feedback that you can use to guide the model through prompting or as part of system instructions. Listing 11.9 shows an example of the feedback provided by o1-preview. You can then extract this feedback and augment the instructions the next time you want to tackle complex time travel problems.

列表 11.9生成的反馈
对未来问题的反馈
理解时间旅行中的时间框架:
相对时间范围:始终考虑相对于……的时间段
原始参考点。在这个问题中,即使 Alex 跳跃
五十年过去了,相对于他最初的状态而言,他仍然停留在过去。
现在。
包括所有相关时期:
确保包含过去(或未来)的所有时间段。
在总计算中,即使它们涉及迁移到不同的位置,也应计入在内。
时间点。
时间连续性:
请记住,当旅行者穿越时空时,其他时间发生的事件也会随之改变。
除非另有规定,否则时间段将继续向前推进。
应用:
仔细阅读:
请仔细阅读所提供的详细信息,例如持续时间和
时间跳跃序列。
可视化:
绘制时间轴或许有助于形象化地展现事件顺序。
涉及的时间段。
复核计算结果:
解决问题后,请重新检查每个步骤,确保所有时间段都正确无误。
已正确核算。
Feedback for Future Problems
Understanding Time Frames in Time Travel:
Relative Time Frames: Always consider the time periods relative to the 
original point of reference. In this problem, even though Alex jumps 
forward 50 years, he is still in the past relative to his original 
present time.
Including All Relevant Periods: 
Ensure that all time periods spent in the past (or future) are included 
in the total calculation, even if they involve moving to different 
points in time.
Time Continuity: 
Remember that while the traveler moves through time, events in other 
time periods continue to progress unless specified otherwise.
Application:
Careful Reading: 
Pay close attention to the details provided, such as the durations and 
the sequences of time jumps.
Visualization: 
It may help to draw a timeline to visualize the sequence of events and 
time periods involved.
Double-Checking Calculations: 
After solving, revisit each step to ensure all time periods are 
accounted for correctly.

这种反馈技术对诸如 o1-preview 之类的模型始终有效,但其他模型即使有了这种反馈,可能仍然难以给出正确答案。随着时间的推移,随着模型变得越来越智能,这种技术很可能适用于大多数模型。然而,即使模型变得越来越智能,这种反馈机制也可能仍然至关重要,因为语言是微妙的,我们提出的挑战语言学习模型(LLM)的问题并非都有显而易见的绝对答案。以我们给出的示例问题为例。这个问题就是一个很好的例子,它要求解题者做出假设并从问题中得出关联。在科学领域,从地质学到行为科学,仍然存在许多领域,同一个问题的答案可能多种多样。接下来,让我们看看如何将推理、规划、评估和反馈应用于智能体系统。

This feedback technique will consistently work on models such as o1-preview, but other models may still struggle to answer correctly, even given this feedback. Over time, as models become smarter, this technique will likely generally work on most models. However, this feedback mechanism will likely be essential even as models get progressively brighter. because language is nuanced, and not every problem we challenge LLMs with may have an obvious absolute answer. Take our example problem, for instance. This problem is an excellent example of requiring the problem solver to make assumptions and draw correlations from the question. There are still plenty of areas in science, from geology to behavioral science, where answering the same problem may yield a range of answers. Let’s look next at a few techniques for how the application of reasoning, planning, evaluation, and feedback can be applied to agentic systems.

11.5将规划、推理、评估和反馈应用于辅助系统和代理系统

11.5 Applying planning, reasoning, evaluation, and feedback to assistant and agentic systems

在前几章中,我们探讨了如何实现规划、推理、反馈和评估等智能体组成部分。现在,我们将探讨如何、何时以及在何处将这些组成部分集成到用于实时生产、研究或开发的辅助和智能体系统中。

In recent chapters, we’ve examined how the agentic components of planning, reasoning, feedback, and evaluation can be implemented. Now we look at how, when, and where those components can be integrated into assistant and agentic systems for real-time production, research, or development.

虽然并非所有组件都适用于所有应用场景,但了解何时何地应用哪个组件仍然十分有用。下一节,我们将探讨如何将规划功能集成到助手/代理系统中。

While not all of these components may fit the same into every application, it’s useful to understand where and when to apply which component. In the next section, we look at how planning can be integrated into assistant/agentic systems.

11.5.1助理/代理人规划的应用

11.5.1 Application of assistant/agentic planning

规划是指助手或代理能够计划执行一系列任务的组件,无论这些任务是串行、并行还是其他组合。我们通常将规划与工具的使用联系起来,而且理所当然地,任何使用工具的系统都需要一个功能强大的规划器。然而,并非所有系统都具有相同的功能,因此在表 11.1 中,我们将回顾在何处、何时以及如何实施规划器。

Planning is the component where an assistant or agent can plan to undertake a set of tasks, whether they are in series, parallel, or some other combination. We typically associate planning with tool use, and, rightfully, any system using tools will likely want a capable planner. However, not all systems are created equally, so in table 11.1, we’ll review where, when, and how to implement planners.

表 11.1规划在各种应用中何时何地被采用和使用
应用
实施的
环境
目的
定时
配置
私人助理

在法学硕士课程期间或课程中

个人设备

促进工具使用

在响应过程中

作为提示或法学硕士的一部分

客服机器人

不常见;受限环境

受限环境,禁止使用工具。

自主代理

作为代理提示的一部分,并在LLM框架内

服务器或服务

促进复杂工具的使用和任务规划

作为构建代理的一部分和/或响应期间

在代理人或LLM内部

协作工作流程

作为法学硕士课程的一部分

共享画布或编码

促进复杂工具的使用

在响应过程中

在法学硕士课程中

游戏AI

作为法学硕士课程的一部分

服务器或应用程序

复杂工具的使用和规划

在反应之前或期间

在法学硕士课程中

研究

任何地方

服务器

促进工具使用并参与复杂的任务工作流程

在反应生成之前、期间和之后

任何地方

表 11.1 展示了几个不同的应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中运用规划:

Table 11.1 shows several varied application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how planning may be employed in each application:

  • 个人助理——虽然这项应用推广缓慢,但LLM个人助理有望在未来超越Alexa和Siri。对于这些新型助理/代理而言,规划至关重要,它们需要协调众多复杂任务,并串行或并行执行各种工具(操作)。
  • Personal assistant—While this application has been slow to roll out, LLM personal assistants promise to surpass Alexa and Siri in the future. Planning will be essential to these new assistants/agents to coordinate numerous complex tasks and execute tools (actions) in series or parallel.
  • 客服机器人——由于这种环境的可控性,直接与客户互动的客服人员不太可能使用受控且非常具体的工具。这意味着这类客服人员可能不需要进行大量的规划。
  • Customer service bot—Due to the controlled nature of this environment, it’s unlikely that assistants engaged directly with customers will have controlled and very specific tools use. This means that these types of assistants will likely not require extensive planning.
  • 自主智能体——正如我们在前几章中所看到的,具备规划能力的智能体可以完成一系列复杂的任务以实现各种目标。规划将是任何自主智能体系统不可或缺的要素。
  • Autonomous agent—As we’ve seen in previous chapters, agents with the ability to plan can complete a series of complex tasks for various goals. Planning will be an essential element of any autonomous agentic system.
  • 协作式工作流程——可以将其视为与程序员或撰稿人并肩工作的代理或助手。虽然这些工作流程仍处于早期开发阶段,但可以设想这样一种工作流程:代理会自动被分配任务,与开发人员一起编写和执行测试代码。规划将是执行这些未来复杂工作流程的关键环节。
  • Collaborative workflows—Think of these as agents or assistants that sit alongside coders or writers. While these workflows are still in early development, think of a workflow where agents are automatically tasked with writing and executing test code alongside developers. Planning will be an essential part of executing these complex future workflows.
  • 游戏人工智能——尽管将逻辑逻辑模型(LLM)应用于游戏仍处于早期阶段,但我们不难想象游戏中会出现能够协助或挑战玩家的智能体或助手。赋予这些智能体规划和执行复杂工作流程的能力,可能会彻底改变我们玩游戏的方式和对象。
  • Game AI—While applying LLMs to games is still in early stages, it isn’t hard to imagine in-game agents or assistants that can assist or challenge the player. Giving these agents the ability to plan and execute complex workflows could disrupt how and with whom we play games.
  • 研究——与协作工作流程类似,这些代理人将负责从现有信息来源中提炼新想法。信息的获取很可能需要广泛使用工具,而协调规划将对此大有裨益。
  • Research—Similar to collaborative workflows, these agents will be responsible for deriving new ideas from existing sources of information. Finding that information will likely be facilitated through extensive tool use, which will benefit from coordination of planning.

如您所见,规划是许多LLM应用的关键组成部分,无论是通过协调工具的使用,还是其他方式。下一节,我们将探讨推理的下一个组成部分,以及如何将其应用于同一应用栈。

As you can see, planning is an essential part of many LLM applications, whether through coordination of tool use or otherwise. In the next section, we look at the next component of reasoning and how it can be applied to the same application stack.

11.5.2辅助/代理推理的应用

11.5.2 Application of assistant/agentic reasoning

推理虽然通常与计划和任务完成密切相关,但它本身也可以是一个独立的组成部分。随着逻辑逻辑模型(LLM)的成熟和智能化,推理功能通常会被集成到LLM本身。然而,并非所有应用都能从全面的推理中获益,因为它往往会在LLM响应中引入一个思维循环。表11.2概述了如何将推理组件集成到各种LLM应用类型中。

Reasoning, while often strongly associated with planning and task completion, is a component that can also stand by itself. As LLMs mature and get smarter, reasoning is often included within the LLM itself. However, not all applications may benefit from extensive reasoning, as it often introduces a thinking cycle within the LLM response. Table 11.2 describes at a high level how the reasoning component can be integrated with various LLM application types.

表 11.2推理在各种应用中何时何地被运用
应用
实施的
环境
目的
定时
配置
私人助理

在法学硕士课程中

个人设备

将工作分解成若干步骤

在响应过程中

作为提示或法学硕士的一部分

客服机器人

不常见;通常只是提供信息。

工具使用受限,需要使用复合工具

自主代理

作为代理提示的一部分,并在LLM框架内

服务器或服务

促进复杂工具的使用和任务规划

作为法学硕士课程的一部分,外部推理并不适用

在代理人或LLM内部

协作工作流程

作为法学硕士课程的一部分

共享画布或编码

协助分解工作

在响应过程中

在法学硕士课程中

游戏AI

作为法学硕士课程的一部分

服务器或应用程序

对于开展复杂行动至关重要

在反应之前或期间

在法学硕士课程中

研究

任何地方

服务器

了解如何解决复杂问题并参与复杂的任务工作流程

在反应生成之前、期间和之后

任何地方

表 11.2 展示了几个不同的应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中运用推理:

Table 11.2 shows several varied application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how reasoning may be employed in each application:

  • 个人助理——根据应用场景的不同,智能体所使用的推理能力可能受到限制。推理是一个需要智能体(LLM)思考问题的过程,而这通常需要更长的响应时间,具体取决于问题的复杂性和提示的详尽程度。在许多情况下,旨在更接近实时推理的响应可能会被禁用或降低。虽然这可能会限制智能体交互的复杂程度,但有限的推理能力或完全不使用推理能力可以缩短响应时间,并提升用户体验。
  • Personal assistant—Depending on the application, the amount of reasoning an agent employs may be limited. Reasoning is a process that requires the LLM to think through a problem, and this often requires longer response times depending on the complexity of the problem and the extent of the prompt. In many situations, responses intended to be closer to real-time reasoning may be disabled or turned down. While this may limit the complexity at which an agent can interact, limited or no reasoning can improve response times and increase user enjoyment.
  • 客户服务机器人——同样,由于这种环境的受控性质,直接与客户互动的助手不太可能需要执行复杂的或任何形式的推理。
  • Customer service bot—Again, because of the controlled nature of this environment, it’s unlikely that assistants engaged directly with customers will need to perform complex or any form of reasoning.
  • 自主代理——虽然推理是自主代理的重要组成部分,但我们仍然不清楚推理的过度程度如何界定。随着诸如 Strawberry 之类的模型应用于代理工作流程,我们可以评估在什么情况下不需要进行大量的推理。对于定义完善的自主代理工作流程而言,情况无疑会如此。
  • Autonomous agent—While reasoning is a strong component of autonomous agents, we still don’t know how much reasoning is too much. As models such as Strawberry become available for agentic workflows, we can gauge at what point extensive reasoning may not be needed. This will surely be the case for well-defined autonomous agent workflows.
  • 协作工作流程——同样,应用推理功能会增加LLM交互的开销。大量的推理可能对某些工作流程有益,而对其他定义明确的工作流程则可能不利。这意味着这类工作流程可能需要多个代理——包括具备推理功能的代理和不具备推理功能的代理。
  • Collaborative workflows—Again, applying reasoning creates an overhead in the LLM interaction. Extensive reasoning may provide benefits for some workflows, while other well-defined workflows may suffer. This may mean that these types of workflows will benefit from multiple agents—those with reasoning and those without.
  • 游戏AI——与其他应用类似,高强度推理应用可能并不适用于大多数游戏AI。游戏尤其需要低级逻辑模型(LLM)快速响应,这无疑是推理在通用战术智能体中的应用场景。当然,这并不排除使用其他可能提供更强战略控制能力的推理智能体。
  • Game AI—Similar to other applications, heavy-reasoning applications may not be appropriate for most game AIs. Games will especially require LLM response times to be quick, and this will surely be the application of reasoning for general tactical agents. Of course, that doesn’t preclude the use of other reasoning agents that may provide more strategic control.
  • 研究——推理对于任何复杂的科研任务都至关重要,原因有以下几点。一个很好的例子是草莓模型的应用,我们已经在数学和科学领域的研究中看到了它的身影。
  • Research—Reasoning will likely be essential to any complex research task for several reasons. A good example is the application of the Strawberry model, which we’ve already seen in research done in mathematics and the sciences.

虽然我们通常将推理与规划结合起来考虑,但在某些情况下,二者的实施层级可能有所不同。下一节我们将探讨评估各种应用程序的智能体支柱。

While we often consider reasoning in tandem with planning, there may be conditions where the level at which each is implemented may differ. In the next section we consider the agent pillar of evaluation of various applications.

11.5.3评估在智能体系统中的应用

11.5.3 Application of evaluation to agentic systems

评估是智能体/辅助系统的重要组成部分,它能够指导系统的性能表现。虽然我们已展示了如何在一些智能体工作流程中集成评估,但评估通常是智​​能体系统的外部组件。然而,它也是大多数学习领导力管理(LLM)应用的核心组件,在大多数开发过程中都不应被忽视。表 11.3 概述了如何将评估组件集成到各种 LLM 应用类型中。

Evaluation is the component of agentic/assistant systems that can guide how well the system performs. While we demonstrated incorporating evaluation in some agentic workflows, evaluation is often an external component in agentic systems. However, it’s also a core component of most LLM applications and not something that should be overlooked in most developments. Table 11.3 describes at a high level how the evaluation component can be integrated with various LLM application types.

表 11.3评估在各种应用中何时何地被采用和使用
应用
实施的
环境
目的
定时
配置
私人助理

外部的

服务器

评估系统运行状况。

互动之后

通常由外部开发

客服机器人

外接显示器

服务器

评估每次互动的成功率

互动之后

代理系统外部

自主代理

外部或内部

服务器或服务

在任务完成后或任务完成过程中评估系统的成功程度

互动之后

外部或内部

协作工作流程

外部的

共享画布或编码

评估合作的成功程度

互动之后

外部服务

游戏AI

外部或内部

服务器或应用程序

评估代理人或评估策略或行动的成功程度

互动之后

外部人员或作为代理人或其他代理人的一部分

研究

结合手册和LLM

服务器和人

评估已开展的研究成果

生成输出后

这取决于问题的复杂程度和所进行的研究。

表 11.3 展示了几个不同的应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中进行评估:

Table 11.3 shows several varied application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how evaluation may be employed in each application:

  • 个人助理——在大多数情况下,评估组件将用于处理和指导代理的响应性能。在主要采用检索增强生成(RAG)进行文档探索的系统中,评估结果会显示助理对信息请求的响应效果如何。
  • Personal assistant—In most cases, an evaluation component will be used to process and guide the performance of agent responses. In systems primarily employing retrieval augmented generation (RAG) for document exploration, the evaluation indicates how well the assistant responds to information requests.
  • 客户服务机器人——评估服务机器人对于了解机器人对客户请求的响应效率至关重要。在许多情况下,强大的 RAG(红黄绿)知识库可能是系统中需要进行广泛且持续评估的要素。同样,与大多数评估组件一样,该组件位于主运行系统之外,通常作为监控多个指标的整体性能的一部分运行。
  • Customer service bot—Evaluating service bots is critical to understanding how well the bot responds to customer requests. In many cases, a strong RAG knowledge element may be an element of the system that will require extensive and ongoing evaluation. Again, with most evaluation components, this element is external to the main working system and is often run as part of monitoring general performance over several metrics.
  • 自主代理——在大多数情况下,人工审核代理的输出是衡量自主代理成功与否的主要依据。然而,在某些情况下,内部评估可以帮助代理在执行复杂任务时进行指导,或者作为改进最终输出的一种手段。CrewAI 和 AutoGen 等多代理系统就是利用内部反馈来改进生成输出的自主代理的例子。
  • Autonomous agent—In most cases, a manual review of agent output will be a primary guide to the success of an autonomous agent. However, in some cases, internal evaluation can help guide the agent when it’s undertaking complex tasks or as a means of improving the final output. Multiple agent systems, such as CrewAI and AutoGen, are examples of autonomous agents that use internal feedback to improve the generated output.
  • 协作式工作流程——在大多数直接应用中,这类工作流程中会持续进行人工评估。用户通常会通过评估输出结果,立即且近乎实时地纠正助手/代理。与自主代理类似,也可以添加其他代理,以构建更复杂的协作式工作流程。
  • Collaborative workflows—In most direct cases, manual evaluation is ongoing within these types of workflows. A user will often immediately and in near real time correct the assistant/agent by evaluating the output. Additional agents could be added similarly to autonomous agents for more extensive collaborative workflows.
  • 游戏人工智能的评估通常分为开发评估和游戏内评估。开发评估评估智能体如何与游戏互动,而游戏内评估则评估智能体完成任务的完成情况。后一种评估方式与自主智能体类似,但旨在改进某些策略或执行方式。此类游戏内评估也可能受益于记忆功能和反馈机制。
  • Game AI—Evaluation will often be broken down into development evaluation—evaluating how the agent interacts with the game—and in-game evaluation, evaluating how well an agent succeeded at a task. Implementing the later evaluation form is similar to autonomous agents but aims to improve some strategies or execution. Such in-game evaluations would also likely benefit from memory and a means of feedback.
  • 研究——这一层面的评估通常是在研究任务完成后由人工完成的。智能体可以采用类似于自主智能体的评估方式来改进生成的输出,甚至可以在内部思考如何扩展或进一步研究对输出的评估。由于这目前是智能体开发的一个新领域,其执行效果如何还有待观察。
  • Research—Evaluation at this level generally occurs as a manual effort after completing the research task. An agent could employ some form of evaluation similar to autonomous agents to improve the generated output, perhaps even contemplating internally how evaluation of the output could be extended or further researched. Because this is currently a new area for agentic development, how well this will be executed remains to be seen.

评估是任何智能体或助手系统不可或缺的组成部分,尤其当该系统向用户提供真实且基础的信息时更是如此。为智能体和助手开发评估系统或许足以单独成书。在本章最后一节中,我们将探讨各种学习生命周期管理(LLM)应用中的反馈实现。

Evaluation is an essential element to any agentic or assistant system, especially if that system provides real and fundamental information to users. Developing evaluation systems for agents and assistants is likely something that could or should have its own book. In the final section of this chapter, we’ll look at feedback implementation for various LLM applications.

11.5.4将反馈应用于代理/助理应用程序

11.5.4 Application of feedback to agentic/assistant applications

作为智能体系统的一个组成部分,反馈通常(即便并非总是)以外部组件的形式实现——至少目前如此。或许随着人们对评估系统的信心不断增强,反馈机制能够被常规地整合到此类系统中。表 11.4 展示了如何在各种学习管理系统(LLM)应用中实现反馈。

Feedback as a component of agentic systems is often, if not always, implemented as an external component—at least for now. Perhaps confidence in evaluation systems may improve to the point where feedback is regularly incorporated into such systems. Table 11.4 showcases how feedback can be implemented into various LLM applications.

表 11.4反馈在各种应用中何时何地被采用和使用
应用
实施的
环境
目的
定时
配置
私人助理

外部或由用户

聚合到服务器或作为系统的一部分

提供系统改进的手段

互动之后或互动过程中

内部和外部

客服机器人

外接显示器

聚合到服务器

具备资格并提供系统改进的途径

互动之后

代理系统外部

自主代理

外部的

在服务器端聚合

为系统改进提供途径

互动之后

外部的

协作工作流程

在互动过程中

共享画布或编码

提供即时反馈机制

在互动过程中

外部服务

游戏AI

外部或内部

服务器或应用程序

作为内部评估反馈的一部分,为动态改进提供了支持。

互动之后或互动过程中

外部人员或作为代理人或其他代理人的一部分

研究

结合手册和LLM

服务器和人

评估已开展的研究成果

生成输出后

这取决于问题的复杂程度和所进行的研究。

表 11.4 列出了几个应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中使用反馈:

Table 11.4 shows several application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how feedback may be employed in each application:

  • 个人助理——如果助理或代理通过聊天式界面与用户互动,用户可以立即提供直接反馈。无论这种反馈是否会在后续对话或互动中持续存在,它通常都会被存储在代理的记忆中。像 ChatGPT 这样的助理现在都具备记忆功能,可以从明确的用户反馈中受益。
  • Personal assistant—If the assistant or agent interacts with the user in a chat-style interface, direct and immediate feedback can be applied by the user. Whether this feedback is sustained over future conversations or interactions, it usually develops within agentic memory. Assistants such as ChatGPT now incorporate memory and can benefit from explicit user feedback.
  • 客服机器人——用户或系统反馈通常在交互结束后通过调查问卷的形式提供。这意味着反馈通常会被汇总到一个外部系统,以便后续改进。
  • Customer service bot—User or system feedback is typically provided through a survey after the interaction has completed. This usually means that feedback is regulated to an external system that aggregates the feedback for later improvements.
  • 自主代理——与机器人类似,自主代理的反馈通常是在代理完成任务后由用户进行评价。由于许多因素都具有主观性,因此反馈机制可能更难获取。本章探讨的反馈生成方法可用于快速改进工程设计。
  • Autonomous agent—Much like bots, feedback within autonomous agents is typically regulated to after the agent has completed a task that a user then reviews. The feedback mechanism may be harder to capture because many things can be subjective. Methods explored in this chapter for producing feedback can be used within prompt engineering improvements.
  • 协作工作流程——与个人助理类似,这类应用程序可以从用户即时直接的反馈中获益。同样,如何在会话之间保存这些信息通常是智能体记忆的一种实现方式。
  • Collaborative workflows—Similar to the personal assistant, these types of applications can benefit from immediate and direct feedback from the user. Again, how this information is persisted across sessions is often an implementation of agentic memory.
  • 游戏AI——反馈可以通过多个智能体与评估机制结合使用。这种反馈形式可以是一次性的,存在于当前交互过程中,也可以作为记忆保存。想象一下,一个游戏AI能够评估自身的行为,利用反馈改进行为,并记住这些改进。虽然这种模式并非游戏的理想选择,但它无疑会提升游戏体验。
  • Game AI—Feedback can be implemented alongside evaluation through additional and multiple agents. This feedback form may again be single-use and exist within the current interaction or may persist as memory. Imagine a game AI that can evaluate its actions, improve those with feedback, and remember those improvements. While this pattern isn’t ideal for games, it will certainly improve the gameplay experience.
  • 研究——与研究中的评估类似,反馈通常在输出评估完成后离线进行。虽然一些研究已经采用多智能体系统,将评估和反馈智能体整合在一起,但这些系统的性能并不总是理想,至少对于目前最先进的模型而言是如此。因此,为了避免常见的反馈循环问题,通常最好将反馈和评估放在最后进行。
  • Research—Similar to evaluation in the context of research, feedback is typically performed offline after the output is evaluated. While some development has been done using multiple agent systems incorporating agents for evaluation and feedback, these systems don’t always perform well, at least not with the current state-of-the-art models. Instead, it’s often better to isolate feedback and evaluation at the end to avoid the common feedback looping problem.

反馈是智能体和辅助系统的另一个重要组成部分,但并非在首次发布时就必须提供。然而,从长远来看,引入严格的反馈和评估机制能够极大地促进智能体系统的持续监控,并增强其改进系统各个方面的信心。

Feedback is another powerful component of agentic and assistant systems, but it’s not always required on the first release. However, incorporating rigorous feedback and evaluation mechanisms can greatly benefit agentic systems in the long term concerning ongoing monitoring and providing the confidence to improve various aspects of the system.

在您的代理系统中,如何实现这些组件,部分取决于您所选代理平台的架构。现在您已经了解了每个组件的细微差别,也就具备了选择适合您的应用和业务用例的代理系统的知识。无论您的应用是什么,几乎在所有情况下,您都需要使用多个代理组件。

How you implement each of these components in your agentic systems may, in part, be guided by the architecture of your chosen agentic platform. Now that you understand the nuances of each component, you also have the knowledge to guide you in selecting the right agent system that fits your application and business use case. Regardless of your application, you’ll want to employ several agentic components in almost all cases.

随着智能体系统的成熟和生命周期模型(LLM)本身的智能化程度不断提高,我们今天视为外部的一些组件可能会被紧密集成。我们已经看到推理和规划功能被集成到诸如Strawberry这样的模型中。可以肯定的是,随着我们逐渐接近理论上的通用人工智能里程碑,我们或许会看到能够进行长期自我评估和反馈的模型。

As agentic systems mature and LLMs themselves get smarter, some of the components we today consider external may be closely integrated. We’ve already seen reasoning and planning be integrated into a model such as Strawberry. Certainly, as we approach the theoretical artificial general intelligence milestone, we may see models capable of performing long-term self-evaluation and feedback.

总之,我希望你们和我一起享受了这段探索新兴技术的奇妙旅程,这项技术必将改变我们对工作的看法以及我们通过代理人开展工作的方式。

In any case, I hope you enjoyed this journey with me into this incredible frontier of a new and emerging technology that will certainly alter our perception of work and how we undertake it through agents.

11.6练习

11.6 Exercises

通过以下练习来加深你对相关内容的理解:

Use the following exercises to improve your knowledge of the material:

  • 练习 1——实现一个简单的规划代理(入门级)
  • Exercise 1—Implement a Simple Planning Agent (Beginner)

目标 ——学习如何使用提示来生成一系列操作,从而实现一个基本的规划代理。

Objective —Learn how to implement a basic planning agent using a prompt to generate a sequence of actions.

任务:

Tasks:

    • 创建一个代理,接收目标,将其分解为多个步骤,并按顺序执行这些步骤。
    • 设定一个简单的目标,例如从维基百科检索信息并将其保存到文件中。
    • 使用基本规划器提示实现代理(请参阅第 11.3 节中的规划器示例)。
    • 运行代理程序,并评估其规划和执行每个步骤的效果。
    • Create an agent that receives a goal, breaks it into steps, and executes those steps sequentially.
    • Define a simple goal, such as retrieving information from Wikipedia and saving it to a file.
    • Implement the agent using a basic planner prompt (refer to the planner example in section 11.3).
    • Run the agent, and evaluate how well it plans and executes each step.
  • 练习 2——规划代理中的测试反馈集成(中级)
  • Exercise 2—Test Feedback Integration in a Planning Agent (Intermediate)

目标 ——了解反馈机制如何提高智能体系统的性能。

Objective —Understand how feedback mechanisms can improve the performance of an agentic system.

任务:

Tasks:

    • 修改练习 1 中的智能体,使其在每个任务完成后都包含一个反馈回路。
    • 利用反馈意见调整或纠正序列中的下一个任务。
    • 通过给智能体布置更复杂的任务(例如从多个来源收集数据)来测试它,并观察反馈如何提高它的性能。
    • 记录并比较代理在添加反馈前后的行为。
    • Modify the agent from exercise 1 to include a feedback loop after each task.
    • Use the feedback to adjust or correct the next task in the sequence.
    • Test the agent by giving it a more complex task, such as gathering data from multiple sources, and observe how the feedback improves its performance.
    • Document and compare the agent’s behavior before and after adding feedback.
  • 练习 3——并行和顺序规划实验(中级)
  • Exercise 3—Experiment with Parallel and Sequential Planning (Intermediate)

目标——了解并行动作和顺序动作之间的区别,以及它们如何影响智能体的行为。

Objective—Learn the difference between parallel and sequential actions and how they affect agent behavior.

任务:

Tasks:

    • 使用 Nexus 设置两个代理:一个并行执行任务,另一个顺序执行任务。
    • 定义一个多步骤目标,其中一些操作取决于先前操作的结果(顺序执行),而另一些操作可以同时执行(并行执行)。
    • 比较两个代理的性能和输出,注意在需要顺序步骤时并行执行过程中出现的任何错误或效率低下之处。
    • Set up two agents using Nexus: one that executes tasks in parallel and another that performs tasks sequentially.
    • Define a multistep goal where some actions depend on the results of previous actions (sequential), and some can be done simultaneously (parallel).
    • Compare the performance and output of both agents, noting any errors or inefficiencies in parallel execution when sequential steps are required.
  • 练习 4 — 在 Nexus 中构建和集成自定义规划器(高级)
  • Exercise 4—Build and Integrate a Custom Planner into Nexus (Advanced)

目标 ——学习如何构建自定义规划器并将其集成到代理平台中。

Objective —Learn how to build a custom planner and integrate it into an agent platform.

任务:

Tasks:

    • 使用第 11.3 节中的提示工程策略编写自定义规划器,确保其支持顺序任务执行。
    • 将此规划器集成到 Nexus 中,并创建一个使用它的代理。
    • 使用涉及多个步骤和工具(例如,数据检索、处理和保存)的复杂目标来测试规划器。
    • 评估自定义计划表与 Nexus 或其他平台内置计划表相比的性能。
    • Write a custom planner using prompt engineering strategies from section 11.3, ensuring it supports sequential task execution.
    • Integrate this planner into Nexus, and create an agent that uses it.
    • Test the planner with a complex goal that involves multiple steps and tools (e.g., data retrieval, processing, and saving).
    • Evaluate how the custom planner performs compared to built-in planners in Nexus or other platforms.
  • 练习 5——在顺序计划中实现错误处理和反馈(高级)
  • Exercise 5—Implement Error Handling and Feedback in Sequential Planning (Advanced)

目标 ——学习如何在智能体系统中实现错误处理和反馈,以改进顺序规划。

Objective —Learn how to implement error handling and feedback to refine sequential planning in an agentic system.

任务:

Tasks:

    • 使用顺序规划器,设置代理来执行可能会遇到常见错误(例如,API 调用失败、数据缺失或输入无效)的目标。
    • 在规划器中实现错误处理机制,以识别和应对这些错误。
    • 添加反馈循环,以便根据遇到的错误调整计划或重试操作。
    • 通过在执行过程中故意造成错误来测试系统,并观察代理如何恢复或调整其计划。
    • Using a sequential planner, set up an agent to perform a goal that may encounter common errors (e.g., a failed API call, missing data, or invalid input).
    • Implement error-handling mechanisms in the planner to recognize and respond to these errors.
    • Add feedback loops to adjust the plan or retry actions based on the error encountered.
    • Test the system by deliberately causing errors during execution, and observe how the agent recovers or adjusts its plan.

概括

Summary

  • 规划对于智能体和助手来说至关重要,它使它们能够设定目标,将其分解为若干步骤并逐一执行。如果没有规划,智能体就只能进行类似聊天机器人的简单交互。
  • Planning is central to agents and assistants, allowing them to take a goal, break it into steps, and execute them. Without planning, agents are reduced to simple chatbot-like interactions.
  • 智能体必须区分并行动作和顺序动作。许多逻辑逻辑模型可以处理并行动作,但只有高级模型才支持顺序规划,而顺序规划对于完成复杂任务至关重要。
  • Agents must differentiate between parallel and sequential actions. Many LLMs can handle parallel actions, but only advanced models support sequential planning, critical for complex task completion.
  • 反馈对于引导智能体修正方向、提升绩效至关重要。本章将阐述如何将反馈机制与智能体相结合,以改进其决策过程。
  • Feedback is crucial in guiding agents to correct their course and improve performance over time. This chapter demonstrates how feedback mechanisms can be integrated with agents to refine their decision-making processes.
  • OpenAI Assistants 和 Anthropic 的 Claude 等平台支持内部规划,并能执行复杂的多步骤任务。使用这些平台的智能体可以利用顺序行动规划来实现复杂的工作流程。
  • Platforms such as OpenAI Assistants and Anthropic’s Claude support internal planning and can execute complex, multistep tasks. Agents using these platforms can use sequential action planning for sophisticated workflows.
  • 正确选择和限制代理的操作对于避免混乱和意外行为至关重要。过多的操作可能会使代理不堪重负,而不必要的工具则可能被滥用。
  • Properly selecting and limiting agent actions is vital to avoid confusion and unintended behavior. Too many actions may overwhelm an agent, while unnecessary tools may be misused.
  • Nexus 提供了一个灵活的界面,允许用户创建和管理代理,他们可以在该界面中实现自定义规划器、设置目标并分配工具。本章包含使用 Nexus 的实际示例,重点展示了原始 LLM 和规划器增强型代理之间的区别。
  • Nexus allows for creating and managing agents through a flexible interface, where users can implement custom planners, set goals, and assign tools. The chapter includes practical examples using Nexus to highlight the difference between a raw LLM and a planner-enhanced agent.
  • 使用提示工程策略,编写自定义规划器非常简单。诸如 LangChain 和 Semantic Kernel 之类的工具提供了各种规划器,可以对其进行调整或扩展,以满足特定的智能体需求。
  • Writing custom planners is straightforward, using prompt engineering strategies. Tools such as LangChain and Semantic Kernel offer a variety of planners that can be adapted or extended to fit specific agentic needs.
  • OpenAI Strawberry 等模型将推理、规划、评估和反馈直接集成到 LLM 中,从而提供更准确的问题解决能力。
  • Models such as OpenAI Strawberry integrate reasoning, planning, evaluation, and feedback directly into the LLM, offering more accurate problem-solving capabilities.
  • 评估有助于确定代理系统的性能,并且可以根据用例在内部或外部实施。
  • Evaluation helps determine how well an agentic system is performing and can be implemented internally or externally, depending on the use case.
  • 随着 LLM 的发展,推理、规划和反馈机制可能会深入集成到模型中,从而为更自主、更智能的代理系统铺平道路。
  • As LLMs evolve, reasoning, planning, and feedback mechanisms may become deeply integrated into models, paving the way for more autonomous and intelligent agent systems.

附录 A 访问 OpenAI 大型语言模型

appendix A Accessing OpenAI large language models

尽管市面上有很多商业化的大型语言模型 (LLM) 服务,但本书建议直接使用 OpenAI 服务或通过 Azure OpenAI Studio 使用。要访问这些服务,您必须创建一个帐户并注册一种付款方式(本附录中未提及)。GPT-4 系列 LLM 被认为是同类最佳,更适合用于智能体开发。使用开源或其他服务始终是一种选择,但通常建议在您使用 GPT-4 一段时间后再考虑。

Although several commercial large language model (LLM) services are available, this book recommends using OpenAI services directly or through Azure OpenAI Studio. To access either service, you must create an account and register a payment method not covered in this appendix. The GPT-4 family of LLMs is considered best in class and better suited for agent development. Using open source and alternative services is always an option but generally only advisable after you’ve worked with GPT-4 for some time.

A.1访问 OpenAI 帐户和密钥

A.1 Accessing OpenAI accounts and keys

以下几个通用步骤可以帮助您快速设置,以便使用 OpenAI LLM 进行智能体开发。虽然使用 OpenAI 和其他商业 LLM 需要付费,但完成本书中的所有练习预计花费不到 100 美元:

The following general steps can help you quickly set up using OpenAI LLMs for agent development. Though using OpenAI and other commercial LLMs comes at a price, you can expect to pay less than US$100 to complete all the exercises in this book:

  1. 访问https://openai.com并登录,或者注册一个账号并登录。如果您是首次创建账号,您可能会获得一定数量的免费额度。如果您已有账号,则必须注册付款方式和类型。通常建议一次性购买多个额度,这样可以更好地控制成本,避免超支。
  2. Go to https://openai.com and log in, or register for an account and log in. If this is your first time creating an account, you’ll likely be given free credit in some amount. If you already have an account, you must register a payment method and type. It’s generally better to purchase a number of credits at a time. This will allow you to manage the costs better and avoid overruns.
  3. 登录平台后,选择 ChatGPT 或 API,如图 A.1 所示。选择 API。
  4. After logging in to the platform, select ChatGPT or the API, as shown in figure A.1. Choose the API.
数字
图 A.1选择 OpenAI 平台的 API 部分
  1. 3.打开左侧菜单,选择 API 密钥选项,如图 A.2 所示。
  2. 3. Open the left menu, and select the API Keys option, as shown in figure A.2.
数字
图 A.2选择 API 密钥选项
  1. 4.单击“创建”按钮创建新密钥,输入密钥名称,然后单击“创建密钥”按钮,如图 A.3 所示。
  2. 4. Click the Create button to create a new key, enter a name for the key, and click the Create Secret Key button, as shown in figure A.3.
数字
图 A.3创建密钥 API
  1. 5.使用复制按钮将密钥复制并粘贴到记事本或其他安全位置,如图 A.4 所示。务必保守此密钥的秘密,并确保它仅保留在您的开发计算机上。
  2. 5. Copy and paste the key to a notepad or another area for safekeeping using the Copy button, as shown in figure A.4. Keep this key secret, and ensure it remains only on your development machine.
数字
图 A.4将钥匙复制并粘贴到众所周知的安全位置

生成密钥后,您可以继续在.env配置文件中使用它,或者通过其他方式注册 OpenAI 密钥。本书中使用的大多数软件包通常只需密钥即可配置 OpenAI。其他服务(例如 Azure OpenAI)则需要配置模型部署和基本 URL,这将在下一节中介绍。

After generating a key, you can continue to use it within an .env configuration file or through other means of registering an OpenAI key. For most of the packages used in this book, configuring OpenAI will generally only require the key. Other services, such as Azure OpenAI, will require the configuration of a model deployment and a base URL as covered in the next section.

A.2 Azure OpenAI Studio、密钥和部署

A.2 Azure OpenAI Studio, keys, and deployments

凭借与 OpenAI 的长期合作关系,微软在 Azure OpenAI Studio 中以相同的价格提供相同的模型。虽然 Azure 上的模型版本偶尔可能会稍有延迟,但微软通常会保持与 OpenAI 最新模型同步。

Through its ongoing relationship with OpenAI, Microsoft hosts the same models at the same price within Azure OpenAI Studio. Occasionally, Azure may be a model version behind, but Microsoft generally keeps current with the latest OpenAI models.

这些指南将更具通用性,因为访问 Azure 的方式有多种,创建帐户和访问工作室的方法也有多种(有关具体说明,请参阅 Microsoft 文档):

These guidelines will be more general because there are several ways to access Azure and methods of creating accounts and accessing the studio (for specific instructions, refer to Microsoft documentation):

  1. 登录到您的 Azure 门户帐户订阅。
  2. Log in to your Azure portal account subscription.
  3. 在您认为合适的区域创建一个新的 Azure OpenAI Studio 资源。截至撰写本文时,并非所有区域都提供所有模型的访问权限。您可能需要先查看您所在区域可用的模型。这还取决于您的帐户和使用情况。
  4. Create a new Azure OpenAI Studio resource in a region that makes sense to you. At the time of writing, not all regions provided access to all models. You may need to check which models are available for your region first. This will also be specific to your account and usage.

在 Azure OpenAI 中,模型通过称为部署的资源分配方式公开。部署封装了模型(例如 GPT-4),并提供对该资源的访问。图 A.5 展示了通过部署公开的各种模型的示例。

Within Azure OpenAI, models are exposed through a resource allocation called a deployment. Deployments wrap a model, such as GPT-4, and provide access to the resource. Figure A.5 shows an example of various models being exposed through deployments.

数字
图 A.5通过 Azure OpenAI Studio 部署模型
  1. 3.单击“创建新部署”按钮创建新部署,然后选择要部署的模型。
  2. 3. Click the Create New Deployment button to create a new deployment, and then select the model you want to deploy.
  3. 4.模型部署完成后,您必须访问父级 Azure OpenAI 资源。然后,您可以访问配置连接所需的密钥、终结点或基本 URL,如图 A.6 所示。
  4. 4. After the model is wrapped in a deployment, you must access the parent Azure OpenAI resource. From there, you can access the key, endpoint, or base URL needed to configure your connection, as shown in figure A.6.
数字
图 A.6获取用于访问服务的密钥和基本 URL

同样,如果您遇到问题,微软文档可以为您提供正确的指导。使用 OpenAI 工具连接到 Azure OpenAI Studio 或其他 LLM 等资源时,需要记住以下三个关键区别:

Again, if you get stuck, the Microsoft documentation can guide you in the right direction. The three critical differences to remember when connecting to a resource such as Azure OpenAI Studio or another LLM using the OpenAI tooling are listed here:

  • api key访问该模型
  • The api key to access the model
  • 模型所在的底座url或位置endpoint
  • The base url or endpoint where the model is located
  • model或名称deployment
  • The name of the model or deployment name

如果由于某种原因无法访问现有模型,开源模型是一个不错的替代方案。第二章将介绍如何设置和使用开源LLM模型。

If you can’t access a model for whatever reason, a good alternative is open source models. Setting up and consuming open source LLMs is covered in chapter 2.

附录 B Python 开发环境

appendix B Python development environment

本书假定读者是经验丰富的 Python 开发人员,但这可以有多种不同的定义。在本附录中,我们将探讨如何配置一个能够与本书代码示例兼容的 Python 开发环境。您可以使用其他集成开发环境 (IDE),但并非所有工具(尤其是扩展程序)都能在所有 IDE 中正常运行。

While this book assumes readers are experienced Python developers, this could mean many different things. In this appendix, we look at configuring a Python development environment that will function with the code examples in this book. You can use other integrated development environments (IDEs), but not all tooling, especially extensions, will work in all IDEs.

B.1下载源代码

B.1 Downloading the source code

要下载并运行源代码,请先安装 Git,然后将代码仓库拉取到本地。以下是从本书的 GitHub 代码仓库拉取代码的概要步骤:

To download and run the source code, install Git, and then pull the repository locally. Here are the high-level steps to pull the code from the book’s GitHub repository:

  1. 如果需要,请安装 Git。Git 可以从多个来源安装,但一个不错的选择是使用官方发布版本,可以在这里找到:https://git-scm.com/downloads。请按照说明下载并安装适用于您操作系统的工具。
  2. Install Git if you need to. Git can be installed from multiple sources, but a good option is the main release, found here: https://git-scm.com/downloads. Follow the instructions to download and install the tool for your operating system.
  3. 在要下载源代码的文件夹中打开终端,然后输入以下命令:
  4. Open a terminal in a folder you want to download the source to, and then enter the following command:
git clone https://github.com/cxbxmxcx/GPT-Agents.git
git clone https://github.com/cxbxmxcx/GPT-Agents.git
  1. 3.代码下载完成后,即可在 Visual Studio Code (VS Code) 中打开您正在编辑的章节文件夹。如果您需要安装 VS Code 或了解如何将章节文件夹加载为工作区,请参阅本附录的 B.5 节。
  2. 3. After the code is downloaded, you can begin by opening the chapter folder that you’re working on in Visual Studio Code (VS Code). If you need to install VS Code or understand how to load a chapter folder as a workspace, consult section B.5 in this appendix.

B.2安装 Python

B.2 Installing Python

Python 有多种版本和部署方式。本书基于标准的 Python 安装,版本为 3.10。Anaconda 是另一个非常流行的 Python 部署方式,也可以使用。但是,本书中的所有内容均已在 Python 3.10 虚拟环境中运行和测试。

Python is provided through different versions and deployments. This book relies on the standard Python installation, version 3.10. Anaconda is another deployment of Python that is very popular and could be used. However, all the material in this book has been run and tested with a Python 3.10 virtual environment:

  1. 访问www.python.org/downloads/
  2. Go to www.python.org/downloads/.
  3. 找到并下载适用于您操作系统的最新版本 Python 3.10。
  4. Locate and download the latest release of Python 3.10 for your operating system.
  5. 按照适用于您操作系统的说明,在您的计算机上安装该版本。
  6. Install the release on your machine using the instructions for your operating system.
  7. 要确认安装成功,请打开终端并执行以下命令:
  8. To confirm your installation, open a terminal, and execute the following command:
python--版本
python –-version

版本应该是 3.10,但如果不是,请不要担心。您可能安装了多个 Python 版本。我们也会在设置 VS Code 时确认安装情况。

The version should be 3.10, but if it isn’t, don't worry. You may have multiple Python versions installed. We’ll also confirm the installation when setting up VS Code.

B.3安装 VS Code

B.3 Installing VS Code

安装 VS Code 相对简单,只需几个步骤即可完成:

Installing VS Code is relatively straightforward and can be done in just a few steps:

  1. 访问https://code.visualstudio.com
  2. Go to https://code.visualstudio.com.
  3. 下载适用于您操作系统的稳定版 VS Code。
  4. Download a stable release of VS Code for your operating system.
  5. 下载完成后,请按照适用于您操作系统的安装说明进行操作。
  6. After the release is downloaded, follow the installation instructions for your operating system.
  7. 启动适用于您操作系统的 VS Code,并确保没有出现任何警告或错误。如果遇到问题,请尝试重启计算机和/或重新安装。
  8. Launch VS Code for your operating system, and make sure no warnings or errors appear. If you encounter problems, try to restart your computer and/or reinstall.

VS Code 运行后,我们可以安装必要的扩展。接下来我们将介绍这些扩展。

With VS Code running, we can install the necessary extensions. We’ll cover those extensions next.

B.4安装 VS Code Python 扩展

B.4 Installing VS Code Python extensions

数以千计的 VS Code 扩展程序可以提供出色的 Python 编码环境。推荐的扩展程序只是冰山一角,您可以自行探索更多选择。不过,请注意并非所有扩展程序都一样好。安装新扩展程序时,请查看安装次数和评分。通常建议避免使用评分低于四星的扩展程序。要安装扩展程序,请按照以下步骤操作:

Thousands of extensions for VS Code can provide an excellent Python coding environment. The recommended ones are only the start of what you can explore independently. Beware, though, that not all extensions are created equally. When installing new extensions, look at the number of installs and ratings. Extensions with fewer than four stars are generally to be avoided. To install the extensions, follow these steps:

  1. 启动 VS Code,并打开扩展面板,如图 B.1 所示。
  2. Launch VS Code, and open the Extensions panel, as shown in figure B.1.
数字
图 B.1安装 VS Code 扩展
  1. 2.安装以下扩展程序:
    • Python,用于环境和语言支持
    • Python扩展包,用于涵盖其他扩展
    • Python 环境管理器,用于管理环境
    • Python 缩进,用于代码格式化
    • Flake8,用于代码格式化/检查
    • 提示流程,用于测试 LLM 提示
    • 语义内核工具,用于处理语义内核框架
    • Docker,用于管理 Docker 容器
    • 开发容器,用于运行带有容器的开发环境
  2. 2. Install the following list of extensions:
    • Python, for environment and language support
    • Python Extension Pack, for covering other extensions
    • Python Environment Manager, for managing environments
    • Python Indent, for code formatting
    • Flake8, for code formatting/linting
    • Prompt Flow, for testing LLM prompts
    • Semantic Kernel Tools, for working with the Semantic Kernel framework
    • Docker, for managing Docker containers
    • Dev Containers, for running development environments with containers

您只需为每个运行的 VS Code 环境安装扩展程序。通常情况下,这意味着只需为您的操作系统安装的 VS Code 安装扩展程序即可。但是,如果您在容器中运行 VS Code,则必须为每个运行的容器安装扩展程序。本附录稍后将介绍如何在 Dev Containers 扩展程序中使用 Python。

You’ll only need to install the extensions for each VS Code environment you’re running. Typically, this will mean installing for just your operating system installation of VS Code. However, if you run VS Code in containers, you must install extensions for each container you’re running. Working with Python in the Dev Containers extension will be covered later in this appendix.

B.5使用 VS Code 创建新的 Python 环境

B.5 Creating a new Python environment with VS Code

在开发 Python 项目时,您通常需要创建隔离的虚拟环境。这有助于管理不同任务和工具中的多个包依赖项。本书建议为每个新章节创建一个新的虚拟环境。VS Code 可以帮助您通过以下步骤快速高效地创建和管理多个 Python 环境:

When developing Python projects, you often want to create isolated virtual environments. This will help in managing multiple package dependencies across various tasks and tools. In this book, it’s recommended that a new virtual environment be created for each new chapter. VS Code can help you create and manage multiple Python environments quickly and efficiently via the following steps:

  1. 按 Ctrl-Shift-P (Cmd-Shift-P) 打开命令面板,然后选择 Python: 创建环境,如图 B.2 所示。
  2. Press Ctrl-Shift-P (Cmd-Shift-P) to open the command panel, and select Python: Create Environment, as shown in figure B.2.
数字
图 B.2章节虚拟环境的设置步骤
  1. 2.选择环境类型,可以是 Venv 或 Conda。本书以 Venv 为例进行说明,但 Conda 也应该适用。
  2. 2. Select the environment type, either Venv or Conda. This book demonstrates Venv but Conda should also work.
  3. 3.选择 Python 安装版本。本书中的代码至少在 Python 3.10 版本下运行。本书中介绍的代理工具和框架均为前沿技术,因此应支持更高版本的 Python。
  4. 3. Select the Python installation. The code in this book has been run with Python 3.10 at a minimum. The agent tools and frameworks featured in this book are cutting edge, so they should support later versions of Python.
  5. 4.请确保requirements.txt章节文件夹中的文件已被选中。这将安装当前章节所需的所有依赖项。
  6. 4. Check that the requirements.txt file in the chapter folder is selected. This will install all the requirements for the current chapter.

您应该为本书的每个新章节重复这些步骤。另一种方法是使用 VS Code 开发容器,这将在下一节中介绍。

You should complete these steps for each new chapter of the book. The alternative is to use VS Code development containers, which will be covered in the next section.

B.6使用 VS Code 开发容器(Docker)

B.6 Using VS Code Dev Containers (Docker)

使用高级代理以及能够生成和执行代码的代理时,通常建议将其运行在隔离的容器中。容器隔离可以防止操作系统中断或损坏,并为部署代理提供基础。

When working with advanced agents and agents that can generate and execute code, running them in isolated containers is generally recommended. Container isolation prevents operating system disruption or corruption and provides a base for deploying agents.

要全面掌握容器和 Docker 等平台可能需要花费大量精力。幸运的是,使用容器只需要很少的知识,而 VS Code 扩展程序让这一切变得更加容易。

Getting familiar with containers and platforms such as Docker can be an extensive undertaking to grasp everything. Fortunately, it takes very little knowledge to start using containers, and VS Code extensions make this even more accessible.

首先,您需要安装一套容器工具集。Docker 是免费的(如果您只是出于兴趣爱好或是一名学生),而且是最容易上手的。请按照以下说明安装 Docker 并开始使用容器:

You’ll first need to install a container toolset. Docker is free (provided you use the tool as a hobby or you’re a student) and the most accessible. Follow these instructions to install Docker and get started working with containers:

  1. 前往 Docker Desktop 下载页面www.docker.com/products/docker-desktop
  2. Go to the Docker Desktop download page at www.docker.com/products/docker-desktop.
  3. 下载并安装适用于您操作系统的 Docker。按照提示完成其他操作。
  4. Download and install Docker for your operating system. Follow any other instructions as requested.
  5. 启动 Docker 桌面应用程序。完成此步骤将确认 Docker 已安装并正常运行。
  6. Launch the Docker desktop application. Completing this step will confirm you have Docker installed and working as expected.
  7. 打开 VS Code,并确认第 1.4 节中列出的 Docker 扩展已安装。
  8. Open VS Code, and confirm that the Docker extensions listed in section 1.4 are installed.

Docker 和 VS Code 配置完成后,您可以按照以下步骤开始使用开发容器:

With Docker and VS Code configured, you can move on to using Dev Containers by following these steps:

  1. 打开一个新的 VS Code 实例。
  2. Open a new instance of VS Code.
  3. 选择打开远程窗口,如图 B.3 所示。
  4. Select to open a remote window, as shown in figure B.3.
数字
图 B.3在 VS Code 中打开容器的远程窗口
  1. 3.选择“在容器中打开文件夹”可从文件夹启动容器,或者选择“新建开发容器”可在不使用文件夹的情况下启动容器。
  2. 3. Select Open Folder in Container to start a container from a folder, or select New Dev Container to start without a folder.

容器启动后,您的 VS Code 环境将自动连接。这样,您就可以在容器上开发代码,而无需担心依赖项出现问题。

After the container is launched, your VS Code environment will be connected. This allows you to develop code on the container without worrying about dependencies not working.

指数

index

符号

SYMBOLS

一个

A

代理平台

agent platforms

行动

actions

代理人规划

agent planning

代理人

agents

B

B

C

C

D

D

E

E

F

F

函数

functions

反馈

feedback

G

G

GPT(生成式预训练Transformer)

GPTs (Generative Pretrained Transformers)

I

J

J

K

K

L

L

LM工作室

LM Studio

M

M

N

N

关系

Nexus

O

OpenAI

OpenAI

P

P

Python

Python

简介

profiles

Q

R

S

S

Streamlit

Streamlit

T

T

U

U

V

V

VS Code

VS Code

W

W

X

X

X(原名 Twitter)

X (formerly Twitter)

Y

YouTube

YouTube

Z

Z